Using OE to build an XT ‘Service VM’

UPDATE: I’ve deleted the build scripts git repo mentioned in this post and rolled all of my OE project build scripts into one repo. Find it here: git://github.com/flihp/oe-build-scripts.git

UPDATE #2: I’ve written a more up-to-date post on similar work here: http://twobit.us/blog/2013/11/openembedded-xen-network-driver-vm/. The data in this post should be considered out of date.

Over the past few weeks I’ve run into a few misconceptions about XenClient XT and OpenEmbedded. First is that XT is some sort of magical system that mere mortals can’t customize. Second is that building a special-purpose, super small Linux image on OpenEmbedded is an insurmountable task. This post is an attempt to dispel both of these misconceptions and maybe even motivate some fun work in the process.

Don’t get me wrong though, this isn’t a trival task and I didn’t start and end this work in one night. There’s still a bunch of work to do here. I’ll lay that out at the end though. For now, I’ve put up the build scripts I threw together last night on git hub. They’re super minimal and derived from another project. Get them here: https://github.com/flihp/transbridge-build-scripts

My goal here is to build a simple rootfs that XT can boot with a VM ‘type’ of ‘servicevm’. This is the type that the XT toolstack associates with the default ‘Network’ VM. Basically it will be a VM invisible to the user. Eventually I’d like for this example to be useful as a transparent network bridge suitable as an in-line filter or even as a ‘driver domain’. But let’s not get ahead of ourselves …

What image do I build?

The first thing that you need to chose to do when building an image with OE is what MACHINE you’re building for. XT uses Xen for virtualization so whatever platform you’re running it on will dictate the MACHINE. Since XT only runs on Intel hardware it’s pretty safe to assume you’re system is compatible with generic i586. The basic qemux86 MACHINE that’s in the oe-core layer builds for this so for these purposes it’ll suffice. This is already set up in the local.conf in my build scripts.

To build the minimal core image that’s in the oe-core layer just run my build.sh script from the root of the repository. I like to tee the output to a log file for inspection in the event of a failure:

./build.sh | tee build.log

Now you should have a bunch of new stuff in ./tmp-eglibc/deploy/images/ which includes an ext3 rootfs. The file name should be something like core-image-minimal.ext3. Copy this over to your XT dom0 and get ready to build a VM.

Make a VHD

The next thing to do is copy the ext3 image over to a VHD. From within /storage/disks create a new VHD large enough to hold the image. I’ve experimented with both core-image-basic and core-image-minimal and a 100M VHD will be large enough … yes that’s a very small rootfs. core-image-minimal is around 9M:

cd /storage/disks
vhd-util create -n transbridge.vhd -s 100

Next have tap-ctl create a new device node for the VHD:

tap-ctl create -a vhd:/storage/disks/transbridge.vhd

This will output the path to the device node created (and yeah the weird command syntax bugs me too). You can alternatively list the current blktap devices and find yours there:

tap-ctl list
1276    0    0        vhd /storage/ndvm/ndvm.vhd
1281    1    0        vhd /storage/ndvm/ndvm-swap.vhd
...

I’ve got no idea what the first number is (maybe PID of the blktap instance that’s backing the device?) but the 2nd and 3rd numbers are the major / minor for the device. The last column is the VHD file backing the device so find your VHD there and the major number then find the device in /dev/xen/blktap-2/tapdevX … mine had a major number of ‘8’ so that’s what I’ll use in this example. Then just byte copy your ext3 on to this device:

dd if=/storage/disks/transbridge.ext3 of=/dev/xen/blktap-2/tapdev8

Then you can mount your VHD in dom0 to poke around:

mount /dev/xen/blktap-2/tapdev8 /media/hdd

Where’s my kernel?

Yeah so OE doesn’t put a kernel on the rootfs for qemu machines. That’s part of why the core-image-minimal image is so damn small. QEMU doesn’t actually boot like regular hardware so you actually pass it the kernel on the command line so OE’s doing the right thing here. If you want the kernel from the OE build it’ll be in ./tmp-eglibc/deploy/images/ with the images … but it won’t boot on XT 😦

This is a kernel configuration thing. I could have spent a few days creating a new meta layer and customizing the Yocto kernel to get a ‘Xen-ified’ image but taht sounds like a lot of work. I’m happy for this to be quick and dirty for the time being so I just stole the kernel image from the XT ‘Network’ VM to see if I could get my VM booting.

You can do this too by first mounting the Network VMs rootfs. Cool thing is you don’t need to power down the Network VM to mount it’s FS in dom0! The disk is exposed to the Network VM as a read-only device so you can mount it read only in dom0:

mount /dev/xen/blktap-2/tapdev0 /media/cf

Then just copy the kernel and modules over to your new rootfs and set up some symlinks to the kernel image so it’s easy to find:

cp /media/cf/boot/vmlinuz-2.6.32.12-0.7.1 /media/hdd/boot
cp -R /media/cf/lib/modules/2.6.32.12-0.7.1 /media/hdd/lib/modules
cd /media/hdd/boot
ln -s vmlinuz-2.6.32.12-0.7.1 vmlinuz
cd /media/hdd
ln -s ./boot/vmlinuz-2.6.32.12-0.7.1 vmlinuz

You may find that there isn’t enough space on the ext3 image you copied on to the VHD. Remember that the ext3 image is only as large as the disk image created by OE. It’s size won’t be the same as the VHD you created unless you resize it to fill the full VHD. You can do so by first umount’ing the tapdev then running resize2fs on the tapdev:

umount /media/hdd
resize2fs /dev/xen/blktap-2/tapdev8

This will make the file system on the VHD expand to fill the full virtual disk. If you made your VHD large enough you’ll have enough space for the kernel and modules. Like I say above, 100M is a safe number but you can go smaller.

Finally you’ll want to be able to log into your VM. If you picked the minimal image it won’t have ssh or anything so you’ll need a getty listening on the xen console device. Add the following line to your inittab:

echo -e "nX:2345:respawn:/sbin/getty 115200 xvc0" >> /media/hdd/etc/inittab

There’s also gonna be a default getty trying to attach to ttyS0 which isn’t present. When the VM is up it will cause some messages on the console:

respawning too fast: disabled for 5 minutes

You can disable this by removing the ‘S’ entry in inittab but really the proper solution is a new image with a proper inittab for an XT service VM … I’ll get there eventually.

Make it a VM

Up till now all we’ve got is a VHD file. Without a VM to run it nothing interesting is gonna happen though so now we make one. The XT toolstack isn’t documented to the point where someone can just read the man page. But it will tell you a lot about itself if you just run it with out any parameters. Honestly I know very little about our toolstack so I’m always executing xec and grepping though the output.

After some experimentation here are the commands to create a new Linux VM from the provided template and modify it to be a para-virtualized service VM. In retrospect it may be better to use the ‘ndvm’ template but this is how I did it for better or for worse:

xec create-vm-with-template new-vm-linux

This command will output a path to the VM node in the XT configuration database. The name of the VM will also be something crazy. Ge the name from the output of xec-vm and change it to something sensible like ‘minimal’:

xec-vm --name  set name minimal

Your VM will also get a virutal CD-ROM which we don’t want so delete it and then add a disk for VHD we configured:

xec-vm --name minimal --disk 0 delete
xec-vm --name minimal add-disk
xec-vm --name minimal --disk 0 set phys-path /storage/disks/minimal.vhd

Then set all of the VM properties per the instructions provided in the XT Developer Guide:

xec-vm --name minimal --disk 0 set virt-path xvda
xec-vm --name minimal set flask-label "system_u:system_r:nilfvm_t"
xec-vm --name minimal set stubdom false
xec-vm --name minimal set hvm false
xec-vm --name minimal set qemu-dm-path ""
xec-vm --name minimal set slot -1
xec-vm --name minimal set type servicevm
xec-vm --name minimal set kernel /tmp/minimal-vmlinuz
xec-vm --name minimal set kernel-extract /vmlinuz
xec-vm --name minimal set cmd-line "root=/dev/xvda xencons=xvc0 console=xvc0 rw"

Then all that’s left is booting your new minimal VM:

xec-vm --name minimal start

You can then connect to the dom0 end of the Xen serial device to log into your VM:

screen $(xenstore-read /local/domain/$(xec-vm --name minimal get domid)/console/tty))

Next steps

This is a pretty rough set of instructions but it will produce a bootable VM on XenClient XT from a very small OpenEmbedded core-image-minimal. There’s tons of places this can be cleaned up starting with an kernel that’s specific to a Xen domU. A real OE DISTRO would be another welcome addition so various distro specific features could be added and removed more easily. If the lazywebs feel like contributing some OE skills to this effort leave me a comment.

What’s in a hash?

After the initial work on meta-measured it was very clear that configuring an MLE is great but alone it has little value. Sure tboot will measure things for you, it will even store these measurements in your TPM’s PCRs! But the “so what?” remains unanswered: there are hashes in your TPM, who cares?

Even after you’ve set-up meta-measured, launch an MLE and dumped out the contents of /sys/class/misc/tpm0/device/pcrs what have you accomplished? The whole point of meta-measured was to setup the machinery to make this easier and for the PCR values to remain unchanged across a reboot. I was surprised at how much work went into just this. But after this work, the hashes in these PCRs still had no meaning beyond being mysterious, albeit static, hashes.

I closed the meta-measured post stating my next goal was to take a stab at pre-computing some PCR values. Knowing the values that PCRs will have in your final running system allows for secrets to be protected by sealed storage at install time (which I’ve heard called ‘local attestation’ just to confuse things). Naturally the more system state involved in the sealing operation (assume this means ‘more PCRs’ for now) the better. So I had hoped to come back after a bit with the tools necessary for meta-measured to produce a manifest of as many of the tboot PCR values as possible.

Starting with PCR[17]

Naturally I started with what I knew would be the hardest PCR to calculate: the infamous PCR[17]. JPs comment on my last post pointed out some of his heroic efforts to compute PCR[17] so that was a huge help. So first things first: respect to JP for the pointer. This task would have taken me twice as long were it not for his work and the work of others on tboot-devel.

So I set out to calculate PCR[17] but I think my approach was different from those I was able to find in the public domain. The criteria I came up with for my work was:

  1. Calculate PCR[17] for system A on system B.
  2. Do the measurements myself.

So ‘rule #1’ basically says: no reliance on having a console on the running system. This is one part technical purity, one part good design as the intent is to make these tools as flexible as possible and useful in a build system. ‘Rule #2’ is all technical purity. This isn’t an exercise in recreating the algorithm that produces the value that ends up in PCR[17].

This last bit is important. The whole point is to account for the actual things (software, configuration etc) that are measured as part of bringing up a TXT MLE. Once these are identified they need to be collected (maybe even extracted from the system) if possible, and then used to calculate the final hash stored in PCR[17]. So basically, no parsing and hashing the output from ‘txt-stat’, that’s cheating 🙂 I explained this approach to a friend and was instantly accused of masochism. That’s a good sign and I guess there’s an element of that in the approach as well, if not everything I do.

As always, wrapping up one exploratory exercise in learning / brushing up on a language is always a good idea right? So I did as much of my work as possible on this in Python. Naturally I had to break this rule and use some C at the end but that’s a bit of a punchline so I don’t want to spoil that joke.

So if you’re only interested in the code I won’t bore you with any more talk about ‘goals’ and ‘design’. It’s all up on github. The python’s here: https://github.com/flihp/pcr-calc. The C is here: https://github.com/flihp/pcr-calc_c. There isn’t much in the way of documentation but I’ll get into that soon.

If you are interested in the words that accompany this work stay tuned. My next post will give a bit of a tour of the rabbit hole that is calculating PCR[17]. This will include discussion of each ‘thing’ that’s measured and what it all means. Like I said though: the end result is that precalculating PCR[17] for arbitrary platforms is a massive PITA and likely not very useful for my original purposes. After thinking on it a bit however I’m quite certain this info may be useful elsewhere but I’ll save that for discussion on follow-on work.