TPM2 Certificate Authority

Certificate authorities use some serious security measures to protect their signing keys (or at least they’re supposed to). These high security requirements are the realm of the traditional hardware security module (HSM). These things address threat models that include attacks against both software and hardware, sometimes very sophisticated ones (like decapping). You get what you pay for, and so HSMs can get pretty pricey. These extreme levels of security and the associated price seems reasonable to me, at least for the CAs whose keys are trusted by the likes of Chrome and Firefox (and subsequently all of their users).

Not all CAs need that kind of security though. Definitely not the CA I use for various test applications. For this I’ve always just kept the keys on a thumb drive as a way to keep them off of my laptop.

The type, strength and cost of the security measures appropriate for an application aren’t binary. The two example applications above are likely the extremes since the key protection schemes have such radically different security properties: HSM vs commodity OS storing keys / files on disk. But between these two applications are a spectrum of others with varying threat models.

root CA signing key in TPM2

What these applications are isn’t the interesting part so long as we’re willing to admit that they exist. What *is* interesting are the technologies available to mitigate relevant threats. Historically, there haven’t been many options available between the two extremes described above. Recently however some products have emerged to fill this space including TPM2. Microsoft added TPM2 to their logo requirements for Windows 10 so it’s effectively ubiquitous in newer laptops. TPM2 is an interesting option because it mitigates key theft through the use of “shielded locations” where sensitive operations (use of private keys) are carried out separate from the main CPU. For my particular application this is “good enough”.

background

I’ve always used the OpenSSL tools and the associated commands to manage my local CA. The available documentation and collective knowledge on the internet make this tool indispensable and I’ve got my workflow scripted. What I want to do is integrate the TPM2 OpenSSL engine into my existing scripts and configurations.

Documentation for building and installing tpm2tss openssl engine is here: https://github.com/tpm2-software/tpm2-tss-engine/blob/master/INSTALL.md. The rest of this document assumes you have it installed and properly configured.

make your root key

Using the TPM2 to protect your CA signing keys is surprisingly easy. I typically shy away from using the word “easy” when talking about the TPM but in this case, and thanks to the TPM2 OpenSSL Engine it really is. The `tpm2tss` engine provides a binary `tpm2tss-genkey` for key key generation. For this example a simple RSA 2k key is generated:


$ tpm2tss-genkey --alg rsa --keysize 2048 ca-root.key.tss

I’ve given the root key the extension `.tss` because it’s in a form unique to the tpm2tss engine.

Once you’ve got your ca root key we can use the `openssl` command line tool to generate a CSR for it.
NOTE: The details of the openssl configuration file used for root CA signing keys is beyond the scope of this document. I’m using unmodified versions of these same files from the exceptional OpenSSL Certificate Authority by Jamie Nguyen, openssl.cnf.


$ openssl req \     
    -config openssl.cnf \
    -new -x509 \  
    -engine tpm2tss \
    -key ca-root.key.tss \   
    -keyform engine \                                                              
    -new -x509 \
    -days 7300 \  
    -sha256 \     
    -extensions v3_ca \                                                            
    -out ca-root.cert

Notice that the options for this command include `-engine tpm2tss` as well as `-keyform engine`. This causes openssl to generate & sign a CSR for the root key. The output is a self signed cert for the `ca-root.key.tss` key. It’s possible to include the engine configuration in the `openssl.cnf`. They’re provided on the commend line here for emphasis.

issuing subkeys

The rest of this is CA stuff is mechanical: We use the `openssl` tool `req` and `ca` commands to issue subkeys for various purposes while providing the new engine specific command line options. All of this is signing operations and certificate generation and all supported by the tpm2tss openssl engine. I followed Jamie Nguyen’s documentation above substituting in use of the `tpm2tss` engine on the command line where appropriate and everything worked as expected.

conclusions etc

TPM2 is a powerful tool and thanks to the `tpm2-tss-engine` we can continue to use the `openssl` command line tools we all know and love while benefiting from the protections offered by TPM2. The example above is just an example though. Root signing keys are rarely used and so storing them offline or in a token like a yubikey is often the best choice. TPM2 may be a better fit for intermediate keys, like those on a signing server integrated into a CI pipeline. Might be fun to build an “ideal” home CA architecture with a yubikey for the root keys and an embedded platform with a TPM2 for issuing credentials as an an intermediate CA for some application.

HP Elitebook 820 G1 laptop dock script

Upgraded my laptop from an ancient HP 2760p so it’s time to write a new script to handle plug / unplug events from the docking station. Wait, what? Write a new script? Haven’t I done this before? Sure, I wrotea docking script for my Thinkpad x61s, but that was 6 years ago. No way that script still works … no wait it does!

Minor tweaks to sort out some differences between the udev events for the thinkpad / elitebook and changes to the xrandr command for my current monitor setup and it’s good to go!

The thinkpad had better integration with the kernel ‘dock’ driver so it generated different events for ‘dock’ and ‘undock’ events. For the EliteBook I had to trigger on the DRM kernel subsystem getting a ‘change’ event on ‘card0’. Not sure what that means but it’s consistent for plug / unplug. To determine whether the ‘change’ is docked or undocked I’m parsing xrandr output to see which monitor is present where. It’s not as nice but by parsing the output of xrandr it’s pretty simple. So the udev rule looks like this:

SUBSYSTEM=="drm", KERNEL=="card0", ACTION=="change", RUN+="/usr/local/bin/dock.sh"

And the script I’m now using for dock/undocking my EliteBook 820 G1 on Debian 8.0 is here.

Thanks for doing me a solid past me. Well done.

on-break

awesome battery widget

I spent a few hours last night customizing the awesome window manager (wm) on my new laptop install. I’ve been using awesome for a while now (years) but I haven’t done much by way of customizing my setup. The default config from Debian, with a few small tweeks to default tags and layouts, has been sufficient. But having a battery gauge on my laptop is pretty important so I carved out a few minutes to set this up.

As always I’m not the first person to have this problem. Luckily those that came before me put their work up on github so all I had to do was clone awesome-batteryinfo, copy battery.lua into my ~/.config/awesome directory and integrate the battery widget into my rc.lua.

Integrating this widget is pretty painless. There are four steps: First you have lua pull in battery.lua:

require("battery")

Second you instantiate the widget:

mybatterywidget = widget({type = "textbox", name = "batterywidget", align = "right" })

Third you place the widget in a wibox. Debian has a wibox positioned across the top of each screen. I took the batterywidget created above and added it to the wibox widget list along side the layoutbox, textclock etc. My final mywibox.widgets looks like this:

mywibox[s].widgets = {
    {
        mylauncher,
        mytaglist[s],
        mypromptbox[s],
        layout = awful.widget.layout.horizontal.leftright
    },
    mylayoutbox[s],
    mytextclock,
    batterywidget,
    s == 1 and mysystray or nil,
    mytasklist[s],
    layout = awful.widget.layout.horizontal.rightleft
}

Finally I set up a timed event to update the widget every 20 seconds. I also seeded the widget text to show the battery data at the time the widget is created. This means that the widget will come up with meaningful data before the first even fires (20 seconds after init):

-- seed the battery widget: don't wait for first timer
mybatterywidget.text = batteryInfo("BAT0")
-- timer to update battery widget
mybatterywidget_timer = timer({timeout = 20})
mybatterywidget_timer:add_signal("timeout", function()
    mybatterywidget.text = batteryInfo("BAT0")
end)
mybatterywidget_timer:start()

That’s all there is to it. Thanks to koenwtje for the great widget. I should probably collect my awesome configs into a git repo so I don’t have to go back and rediscover how to do this every time I build a new system …

Getting serial output on my Ivy Bridge NUC

I’d been using a rather old Sandy bridge system (Intel DQ67EP + i7 2600S) to test my work on meta-measured for a long time. Very nice, very stable system. But with Intel getting out of the motherboard business I started eyeing their new venture: the NUC.

The DC53427HYE vPro IVB NUC

Everything is getting smaller and thankfully Intel has finally caught on. Better yet they’re supporting TXT on some of these systems and so when the Haswell NUC was released over the summer the price on thevPro Ivy Bridge NUC (DC53427HYE) finally dropped enough to put it in my price range. Intel opted to skip the vPro NUC for Haswell anyways so it was my only option.

Let the fun of testing TXT on a new system begin! Like any new system we hope it works “out of the box”. But with TXT, odds are it won’t. My SNB system was great but this NUC … not so much, yet. The kicker though is that as systems get smaller something’s got to give. Space ain’t free and … well who needs a serial port anyways right?

NUC IVB guts

Where’s my serial?

So without serial hardware, debugging TXT / tboot is pretty much a lost cause. Sure you can slow down the VGA output with the vga_delay command line option. But if you want to actually analyze the output you need to be able to capture the text somehow and setting vga_delay to a large value and then copying the output by hand doesn’t scale (and it’s a stupid idea to boot). So the search for serial output continues.

To get TXT we must … ::cough:: … endure the presence of the Management Engine (ME) and it’s supposed to have a serial console built in. The docs for the system even say you can get BIOS output from the ME serial console. But for whatever reason, I spent an afternoon messing about with it and made no progress.

I’ve no way to know where the problem with this lies. There are tools for accessing the ME serial console for Linux but I couldn’t get early boot output. Setting up a serial console login for a bare metal Linux system worked but no early boot stuff (BIOS, grub or tboot). Judging by the AMT docs for Linux: you can’t count on the ME serial interface for much. The docs state that if you use Xen then the ME will get the DHCP address all messed up and that setting a static address in the ME interface just doesn’t work. So long story short, the ME serial interface is limited at best and these limitations preclude getting early boot messages like those from tboot.

Now that the ME bashing is done we must fall back on real serial hardware. Thankfully this thing has both a half height and a full height mini-PCIe slot and a market for these arcane serial things still exists. StarTech fills this need with the 2s1p mini PCIe card. This is a great little piece of hardware but the I/O ports aren’t the default (likely to prevent conflict with on-board serial hardware) so we’ve gotta do some work before tboot will use it for ouput messages.

StarTech mini-PCIe serial card

NUC IVB with serial card

We have serial! Now what?

With some real serial hardware we’re half way there. Now we need to get tboot to talk to it. Unfortunately just adding serial to the logging= parameter in the boot config isn’t sufficient. The default base address for the serial I/O port used by tboot is 0x3F8 (see the README). This address corresponds to the “default serial port” aka COM1. So our shiny new mini-PCIe serial hardware must be using a different port.

tboot will log to an alternative port but we need to find the right I/O port address for the add on card. If you’re like me you keep a bootable Linux image on a USB drive handy for times like these. So we boot up the NUC and break out lspci to dump some data about our new serial card:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])

Not a bad start. This card has two serial ports and it shows up as two distinct serial devices. To get the I/O port base address we need to throw some -vvv at lspci. I’ll trim off the irrelevent bits:

02:00.0 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at e030 [size=8]
        Region 1: Memory at f7d05000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d04000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.
02:00.1 Serial controller: NetMos Technology PCIe 9912 Multi-I/O Controller (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 18
        Region 0: I/O ports at e020 [size=8]
        Region 1: Memory at f7d03000 (32-bit, non-prefetchable) [size=4K]
        Region 5: Memory at f7d02000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
.
.
.

The lines we care about here are:

Region 0: I/O ports at e030 [size=8]
Region 0: I/O ports at e020 [size=8]

So the I/O port address for 02:00.0 is 0xe030 and 02:00.1 is 0xe020. The 9 pin headers on the board are labeled S1 and S2 so you can probably guess which is which. With the NUC booted off my Linux USB key we can dump more data bout the hardware so we know for sure but with a serial cable hooked up to S1 I just threw some text at the device to see if something would come out the other end:

echo "test" > /dev/ttyS0

Sure enough I got "test" out. So I know my cable is hooked up to ttyS0. Now to associate /dev/ttyS0 with one of the PCI devices so we can get the I/O port. Poking around in sysfs is the thing to do here:

ls /sys/bus/pci/devices/02:00.0/tty/
ttyS0

With all of this we know we want tboot to log data to I/O port 0xe030 so we need the following options on the command line: logging=serial serial=115200,8n1,0xe030.

Next time

Now that I’ve got some real serial hardware and a way to get tboot to dump data out to it I can finally debug TXT / tboot. We’ll save that for next time.

twobit auto builder part 2

In my last post on automating my project builds with buildbot I covered the relevant buildbot configs. The issue that I left unresolved was the triggering of the builds. Buildbot has tons of infrastructure to do this and their simple would be more than sufficient for handling my OE projects that typically have 3-5 git repos. But with OpenXT there are enough git repos to make polling with buildbot impractical. This post covers this last bit necessary to get builds triggering with a client side hook model.

twobit-git-poller

Now that we’ve worked out the steps necessary to build the software we need to work out how to triggers builds when the source code is updated. Buildbot has extensive infrastructure for triggering builds ranging from classes to poll various SCM systems in a “pull” approach (pulled by buildbot) to scripts that plug into existing SCM repos to run as hook scripts in a “push” approach (pushed by the SCM).

Both approaches have their limitations. The easiest way to trigger events is to use the built in buildbot polling classes. I did use this method for a while with my OE projects but with OpenXT it’s a lost cause. This is because the xenclient-oe.git repo clones the other OpenXT repos at the head of their respective master branches. This means that unlike other OE layers the software being built may change without a subsequent change being made to the meta layer. To use the built in buildbot pollers you’d have to configure one for each OXT git repo and there’s about 60 of them. That’s a lot of git pollers and would require a really long and redundant master config. The alternative here is to set up a push mechanism like the one provided by the buildbot in the form of the git_buildbot.py script.

This sounds easy enough but given the distributed nature of OpenXT it’s a bit harder than I expected but not in a technical sense. Invoking a hook script from a git server is easy enough: just drop the script into the git repo generally as a ‘post-commit’ hook and every time someone pushes code to it the script will fire. If you control the git server where everyone pushes code this is easy enough. Github provides a similar mechanism but that would still makes for a tight coupling of the build master to the OpenXT Github org and here in lies the problem.

Tightly coupling the build master to the OpenXT Github infrastructure is the last thing I want to do. If there’s just one builder for a project this approach would work but we’d have to come up with a way to accommodate others wanting to trigger their build master as well. This could produce a system where there’s an “official” builder and then everyone else would be left hanging. Building something that leaves us in this situation knowingly would be a very bad thing as it would solve a technical problem but produce a “people problem” and those are exponentially harder. The ideal solution here is one that provides equal access / tooling such that anyone can stand up a build master with equal access.

Client-side hooks

The only alternative I could come up with here is to mirror every OpenXT repo (all 60 of them) on my build system and then invent a client side hook / build triggering mechanism. I had to “invent” the client side hooks because Git has almost no triggers for events that happen on the client side (including mirrors). My solution for this is in the twobit-git-poller.

It’s really nothing special. Just a simple daemon that takes a config specifying a pile of git repos to mirror. I took a cue from Dickon and his work here by using a small bit of the Github API to walk through all of the OpenXT repos so that my config file doesn’t become huge.

The thing that makes this unique is some magic I do to fake a hook mechanism as much like post-receive as possible. This allows us to use a script like the git_buildbot.py unmodified. So I expanded the config file to specify an arbitrary script so others can expand on this idea but this script is expected to take input identical to that produced by the post-receive and expected by the git_buildbot.py script.

This is implemented with a naive algorithm: every time the git poller daemon is run we just iterate over the available branches, grab the HEAD of each before and after the fetch and then dump this data into the hook script. I don’t do anything to detect whether or not there was a change even, I just dump data into the hook script. This allows us to use the git_buildbot.py script unmodified and buildbot is smart enough to ignore a hook even where the start and end hashes are the same. There are some details around passing authentication parameters and connection strings but those are just details and they’re in the code for the interested reader.

With this client side hooking mechanism we no longer need the poller objects in buildbot. We can just hook up a PBChangeSource that listens for data from the twobit-git-poller or any other change source. AFAIK this mechanism can live side by side with standard git hooks so if you’ve got an existing buildbot that’s triggered by some git repos that your team is pushing to using this poller shouldn’t interfere. If it does let me know about it so we can sort it out … or you could send me a pull request 🙂

Wrap Up

When I started this work I expected the git poller to be something that others may pick up and use and that maybe it would be adopted as part of the OpenXT build stuff. Now that I think about it though I expect the whole twobit-git-poller to be mostly disposable in the long run. We’ve already made huge progress in aligning the project with upstream Open Embedded and I expect that we’ll have a proper OE meta layer sooner than later. If this actually comes to pass there won’t be enough git repos to warrant such a heavy weight approach. The simple polling mechanisms in in buildbot should be sufficient eventually.

Still I plan to maintain this work for however long it’s useful. I’ll also be maintaining the renevant buildbot config which I hope to generalize a bit. It may even become as general as the Yocto autobuilder but time will tell.

Apache VirtualHost config gone after Wheezy to Jessie upgrade

Here’s a fun one that had me running in circles for a while today:

I’ve been running deluge and the deluge-webui on Debian Wheezy for a while now. Pretty solid. I needed to download a torrent using a magnet URI today and deluge-webui on Wheezy won’t do it. This feature was added to the webui in 1.3.4 though so the version in Jessie should work.

I did the typical dist-upgrade song and dance per the usual but after the upgrade Apache was all hosed up. It was just showing the default example page. All of access logs that would normally go to my configured virtual host were landing in /var/log/apache2/other_vhosts_access.log which is all wrong. I started out thinking it was the hostname of the system that got messed up but that was a dead end.

I started making progress when I found the command

apache2ctl -S

This dumps out a bunch of data about your configuration and it basically said that my VirtualHostM configuration was empty:

VirtualHost configuration:

Yeah it was basically an empty string. This seemed wrong but I wasn’t sure what to expect really. After banging around a bit longer and getting no where I finally decided to just disable and re-enable my site configuration. This was desperation because my site config was already linked into /etc/apache2/sites-enabled so it must have been enabled … right?

a2dissite mysite

But disabling it failed! It gave me some sort of “no such file” error. Whaaaaaa?. So I ran the commend through strace and it turns out that the new apache2 package on Jessie expects the site config file to have the suffix .conf. Changing the name of my site config fragment fixed this and I was then able to enable the config as expected.

That was unbelievably annoying. Hopefully this will save someone else a few minutes.

building HVM Xen guests

On my Xen systems I’ve run pretty much 99% of my Linux guests paravirtualized (PV). Mostly this was because I’m lazy. Setting up a PV guest is super simple. No need for partitions, boot loaders or any of that complicated stuff. Setting up a PV Linux guest is generally as simple as setting up a chroot. You don’t even need to install a kernel.

There’s been a lot of work over the past 5+ years to add stuff to processors and Xen to make the PV extensions to Linux unnecessary. After checking out a presentation by Stefano Stabilini a few weeks back I decided I’m long overdue for some HVM learning. Since performance of HVM guests is now better than PV for most cases it’s well worth the effort.

This post will serve as my documentation for setting up HVM Linux guests. My goal was to get an HVM Linux installed using typical Linux tools and methods like LVM and chroots. I explicitly was trying to avoid using RDP or anything that isn’t a command-line utility. I wasn’t completely successful at this but hopefully I’ll figure it out in the next few days and post an update.

Disks and Partitions

Like every good Linux user LVMs are my friend. I’d love a more flexible disk backend (something that could be sparsely populated) but blktap2 is pretty much unmaintained these days. I’ll stop before I fall down that rabbit hole but long story short, I’m using LVMs to back my guests.

There’s a million ways to partition a disk. Generally my VMs are single-purpose and simple so a simple partitioning scheme is all I need. I haven’t bothered with extended partitions as I only need 3. The layout I’m using is best described by the output of sfdisk:

# partition table of /dev/mapper/myvg-hvmdisk
unit: sectors

/dev/mapper/myvg-hvmdisk1 : start=     2048, size=  2097152, Id=83
/dev/mapper/myvg-hvmdisk2 : start=  2099200, size=  2097152, Id=82
/dev/mapper/myvg-hvmdisk3 : start=  4196352, size= 16775168, Id=83
/dev/mapper/myvg-hvmdisk4 : start=        0, size=        0, Id= 0

That’s 3 partitions, the first for /boot, the second for swap and the third for the rootfs. Pretty simple. Once the partition table is written to the LVM volume we need to get the kernel to read the new partition table to create devices for these partitions. This can be done with either the partprobe command or kpartx. I went with kpartx:

$ kpartx -a /dev/mapper/myvg-hvmdisk

After this you’ll have the necessary device nodes for all of your partitions. If you use kpartx as I have these device files will have a digit appended to them like the output of sfdisk above. If you use partprobe they’ll have the letter ‘p’ and a digit for the partition number. Other than that I don’t know that there’s a difference between the two methods.

Then get the kernel to refresh the links in /dev/disk/by-uuid (we’ll use these later):

$ udevadm trigger

Now we can set up the filesystems we need:

$ mkfs.ext2 /dev/mapper/myvg-hvmdisk1
$ mkswap /dev/mapper/myvg-hvmdisk2
$ mkfs.ext4 /dev/mapper/myvg-hvmdisk3

Install Linux

Installing Linux on these partitions is just like setting up any other chroot. First step is mounting everything. The following script fragment

# mount VM disks (partitions in new LV)
if [ ! -d /media/hdd0 ]; then mkdir /media/hdd0; fi
mount /dev/mapper/myvg-hvmdisk3 /media/hdd0
if [ ! -d /media/hdd0/boot ]; then mkdir /media/hdd0/boot; fi
mount /dev/mapper/myvg-hvmdisk1 /media/hdd0/boot

# bind dev/proc/sys/tmpfs file systems from the host
if [ ! -d /media/hdd0/proc ]; then mkdir /media/hdd0/proc; fi
mount --bind /proc /media/hdd0/proc
if [ ! -d /media/hdd0/sys ]; then mkdir /media/hdd0/sys; fi
mount --bind /sys /media/hdd0/sys
if [ ! -d /media/hdd0/dev ]; then mkdir /media/hdd0/dev; fi
mount --bind /dev /media/hdd0/dev
if [ ! -d /media/hdd0/run ]; then mkdir /media/hdd0/run; fi
mount --bind /run /media/hdd0/run
if [ ! -d /media/hdd0/run/lock ]; then mkdir /media/hdd0/run/lock; fi
mount --bind /run/lock /media/hdd0/run/lock
if [ ! -d /media/hdd0/dev/pts ]; then mkdir /media/hdd0/dev/pts; fi
mount --bind /dev/pts /media/hdd0/dev/pts

Now that all of the mounts are in place we can debootstrap an install into the chroot:

$ sudo debootstrap wheezy /media/hdd0/ http://http.debian.net/debian/

We can then chroot to the mountpoint for our new VMs rootfs and put on the finishing touches:

$ chroot /media/hdd0

Bootloader

Unlike a PV guest, you’ll need a bootloader to get your HVM up and running. A first step in getting the bootloader installed is figuring out which disk will be mounted and where. This requires setting up your fstab file.

At this point we start to run into some awkward differences between our chroot and what our guest VM will look like once it’s booted. Our chroot reflects the device layout of the host on which we’re building the VM. This means that the device names for these disks will be different once the VM boots. On our host they’re all under the LVM /dev/mapper/myvg-hvmdisk and once the VM boots they’ll be something like /dev/xvda.

The easiest way to deal with this is to set our fstab up using UUIDs. This would look something like this:

# / was on /dev/xvda3 during installation
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /               ext4    errors=remount-ro 0       1
# /boot was on /dev/xvda1 during installation
UUID=yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot           ext2    defaults        0       2
# swap was on /dev/xvda2 during installation
UUID=zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz none            swap    sw              0       0

By using UUIDs we can make our fstab accurate even in our chroot.

After this we need to set up the /etc/mtab file needed by lots of Linux utilities. I found that when installing Grub2 I needed this file in place and accurate.

Some data I’ve found on the web says to just copy or link the mtab file from the host into the chroot but this is wrong. If a utility consults this file to find the device file that’s mounted as the rootfs it will find the device holding the rootfs for the host, not the device that contains the rootfs for our chroot.

The way I made this file was to copy it off of the host where I’m building the guest VM and then modify it for the guest. Again I’m using UUIDs to identify the disks / partitions for the rootfs and /boot to keep from having data specific to the host platform leak into the guest. My final /etc/mtab looks like this:

rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=253371,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=203892k,mode=755 0 0
/dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=617480k 0 0
/dev/disk/by-uuid/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 rw,relatime,errors=continue,user_xattr,acl 0 0

Finally we need to install both a kernel and the grub2 bootloader:

$ apt-get install linux-image-amd64 grub2

Installing Grub2 is a pain. All of the additional disks kicking around in my host confused the hell out of the grub installer scripts. I was given the option to install grub on a number of these disks and none were the one I wanted to install it on.

In the end I had to select the option to not install grub on any disk and fall back to installing it by hand:

$ grub-install --force --no-floppy --boot-directory=/boot /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

And then generate the grub config file:

update-grub

If all goes well the grub boot loader should now be installed on your disk and you should have a grub config file in your chroot /boot directory.

Final Fixups

Finally you’ll need to log into the VM. If you’re confident it will boot without you having to do any debugging then you can just configure the ssh server to start up and throw a public key in the root homedir. If you’re like me something will go wrong and you’ll need some boot logs to help you debug. I like enabling the serial emulation provided by qemu for this purpose. It’ll also allow you to login over serial which is convenient.

This is pretty standard stuff. No paravirtual console through the xen console driver. The qemu emulated serial console will show up at ttyS0 like any physical serial hardware. You can enable serial interaction with grub by adding the following fragment to /etc/default/grub:

GRUB_TERMINAL_INPUT=serial
GRUB_TERMINAL_OUTPUT=serial
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"

To get your kernel to log to the serial console as well set the GRUB_CMDLINE_LINUX variable thusly:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,38400n8"

Finally to get init to start a getty with a login prompt on the console add the following to your /etc/inittab:

T0:23:respawn:/sbin/getty -L ttyS0 38400 vt100

Stefano Stabilini has done another good write-up on the details of using both the PV and the emulated serial console here: http://xenbits.xen.org/docs/4.2-testing/misc/console.txt. Give it a read for the gory details.

Once this is all done you need to exit the chroot, unmount all of those bind mounts and then unmount your boot and rootfs from the chroot directory. Once we have a VM config file created this VM should be bootable.

VM config

Then we need a configuration file for our VM. This is what my generic HVM template looks like. I’ve disabled all graphical stuff: sdl=0, stdvga=0, and vnc=0, enabled the emulated serial console: serial='pty' and set xen_platform_pci=1 so that my VM can use PV drivers.

The other stuff is standard for HVM guests and stuff like memory, name, and uuid that should be customized for your specific installation. Things like uuid and the mac address for your virtual NIC should be unique. There are websites out there that will generate these values. Xen has it’s own prefix for MAC addresses so use a generator to make a proper one.

builder = "hvm"
memory = "2048"
name = "myvm"
uuid = "uuuuuuuu-uuuu-uuuu-uuuu-uuuuuuuuuuuu"
vcpus = 1
cpus = '0-7'
pae=1
acpi=1
apic=1
boot='c'
xen_platform_pci=1
sdl=0
vnc=0
vnclisten='0.0.0.0'
stdvga=0
serial='pty'

disk = [
    '/dev/ssdraid1/wwwhome,raw,xvda,rw'
]
vif = [
    'mac=XX:XX:XX:XX:XX:XX,model=e1000',
]

Boot

Booting this VM is just like booting any PV guest:

xl create -c /etc/xen/vms/myvm.cfg

I’ve included the -c option to attach to the VMs serial console and ideally we’d be able to see grub and the kernel dump a bunch of data as the system boots.

TODO

I’ve tested these instructions twice now on a Debian Wheezy system with Xen 4.3.1 installed from source. Both times Grub installs successfully but fails to boot. After enabling VNC for the VM and connecting with a viewer it’s apparent that the VM hangs when SEABIOS tries to kick off grub.

As a work-around both times I’ve booted the VM from a Debian rescue ISO, setup a chroot much like in these instructions (the disk is now /dev/xvda though) and re-installed Grub. This does the trick and rebooting the VM from the disk now works. So I can only conclude that either something from my instructions w/r to installing Grub is wrong but I think that’s unlikely as they’re confirmed from numerous other “install grub in a chroot” instructions on the web.

The source of the problem is speculation at this point. Part of me wants to dump the first 2M of my disk both after installing it using these instructions and then again after fixing it with the rescue CD. Now that I think about it the version of Grub installed in my chroot is probably a different version than the one on the rescue CD so that could have something to do with it.

Really though, I’ll probably just install syslinux and see if that works first. My experiences with Grub have generally been bad any time I try to do something out of the ordinary. It’s incredibly complicated and generally I just want something simple like syslinux to kick off a very simple VM.

I’ll post an update once I’ve got to the bottom of this mystery. Stay tuned.

TXT Capable Desktop Virtualization System

Having worked on XenClient XT for the past year I’ve experienced the pain of debugging vendors TXT implementations first hand. TXT may be a nearly 6 year old technology but it’s just now coming into use and many vendors platforms have only received internal testing. We’ve found a number of ways for platforms to fail in strange ways and we’ve had to work with the vendors to get their implementations working for a system like XT that uses tboot as part of our measured launch.

For development Citrix has provided me with a number of systems but I’ve been meaning to put one together for myself for some time now. I’ve always liked building my own so I wasn’t thrilled with the prospect of purchasing a Dell / HP system. Home builds are always a bit cooler, a bit cheaper, and more fun in general. That said I was a bit worried about being able to find a motherboard / CPU combo with full AND WORKING VT-x, VT-d and TXT. It wasn’t as bad as I expected. So the following is a breakdown of the home build system I put together specifically to run XT.

Case

I always start building systems with the case. This will dictate size which in turn limits your choices for motherboards. I’ve had a string of successes building systems in Lian Li cases so again they were my first choice. I wanted this system to be as small as possible. Lian Li happens to make probably the best mini-ITX case on the market: the PC-Q02A. This case is tiny and it comes bundled with a 300W power supply. No room in the back of the case for PCI cards either so if you buy this don’t expect to throw a graphics card in it. Whatever you need has to be on the motherboard!

CPU

Since I intend to run XT on this system the CPU has to support the full Intel vPro suite including TXT. This limited me to high-end intel i5 and i7 processors. Since this system will be in a small, low power case I wanted a 65W CPU and went with the Intel i7-2600S. CPUs aren’t really where you want to save money on your build so I didn’t skimp here.

Motherboard

The motherboard is really where vPro and TXT are either made or broken. The BIOS is where CPU features are either enabled or disabled and many motherboard vendors don’t list anything in their docs about TXT compatibility. This is mostly because home users typically don’t really care. In this case we do so some research is required. I played it safe and went with an Intel DQ67EP. TXT and the TPM worked flawlessly. One thing that was a deviation from the platforms from Dell and HP I’ve played with was the TPM came without an EK populated. It’s a simple case of running tpm_createek on the system but because all of the vendor platforms come with an EK pre-populated the XT code doesn’t account for this situation. Easy to work around.

RAM

From here it’s just a matter of getting RAM and hard drive / CD-ROM. Since this system will be running virutalized desktops the more RAM the better. XT doesn’t over commit RAM so if you want to run two desktops with 4G of RAM each you’ll need a bit more than 8G of RAM since the dom0 and service VMs will need a bit too. I picked up 16G of G.SKILL Ares RAM. Generally I run 3 desktops: one Debian Squeeze for development, one Debian Wheezy for my personal stuffs and one Windows 7 for required email and GoToMeetings each with 4G of ram. This system has no problems handling all 3 at the same time.

Disks

Hard disk is always faster and bigger is better. This is where I saved money though since I’ve already got a huge (6T!) NFS server for bulk storage. I went with a modest 120G OCZ Agility 3. I haven’t done any benchmarking but it’s big enough for my root filesystems and fast enough as well. I also put in a Sony Optiarc BC-5650H-01 6X Slim Blu-Ray Reader. I got in on the cheap from an ebay seller. It’s one of those fancy slimline drives so there’s no tray to eject. I hardly ever use optical media except to rip CDs on to my NFS storage. Not sure how much I’ll use this but it’s nice even though it doesn’t match the case 🙂

Photos

I wish there were something in this last photo to give some perspective on the size of the case. Basically it’s the size of a lunchbox 🙂

XT Configuration

Once it’s all put together with XT installed it pretty much “just works”. The USB stack on XT isn’t perfect so some USB devices won’t work unless you pass the USB controller directly through to a guest using Xen’s PCI passthrough stuff. Since XT doesn’t support USB 3.0 anyways I’ve passed the 3.0 controller on the motherboard through directly to my Windows 7 guest to get my webcam working. The Plantronics headset you see in the photo works fine over the virtualized USB stack though and together they make for a pretty sweet VoIP / Video Converencing / GoToMeeting setup. So that’s it. A home build Intel i7 system with 16G of ram, an SSD and Intel TXT / measured boot running XenClient XT. Pretty solid home system by my standards.

Chrome web sandbox on XenClient

There’s lots of software out there that sets up a “sandbox” to protect your system from untrusted code. The examples that come to mind are Chrome and Adobe for the flash sandbox. The strength of these sandboxes are an interesting point of discussion. Strength is always related to the mechanism and if you’re running on Windows the separation guarantees you get are only as strong as the separation Windows affords to processes. If this is a strong enough guarantee for you then you probably won’t find this post very useful. If you’re interested in using XenClient and the Xen hypervisor to get yourself the strongest separation that I can think of, then read on!

Use Case

XenClient allows you to run any number of operating systems on a single piece of hardware. In my case this is a laptop. I’ve got two VMs: my work desktop (Windows 7) for email and other work stuff and my development system that runs Debian testing (Wheezy as of now).

Long story short, I don’t trust some of the crap that’s out on the web to run on either of these systems. I’d like to confine my web browsing to a separate VM to protect my company’s data and my development system. This article will show you how to build a bare bones Linux VM that runs a web browser (Chromium) and little more.

Setup

You’ll need a linux VM to host your web browser. I like Debian Wheezy since the PV xen drivers for network and disk work out of the box on XenClient (2.1). There’s a small bug that required you use LVM for your rootfs but I typically do that anyways so no worries there.

Typically I do an install omitting even the “standard system tools” to keep things as small as possible. This results in a root file system that’s < 1G. All you need to do then is install the web browser (chromium), rungetty, and the xinint package. Next is a bit of scripting and some minor configuration changes.

inittab

When this VM boots we want the web browser to launch and run full screen. We don’t want a window manager or anything. Just the browser.

When Linux boots, the init process parses the /etc/inittab file. One of the things specified in inittab are processes that init starts like getty. Typically inittab starts getty‘s on 6 ttys but we want it to start chrome for us. We can do this by having init execute rungetty (read the man page!) which we can then have execute arbitrary commands for us:

# /sbin/getty invocations for the runlevels.
#
# The "id" field MUST be the same as the last
# characters of the device (after "tty").
#
# Format:
#  :::
#
# Note that on most Debian systems tty7 is used by the X Window System,
# so if you want to add more getty's go ahead but skip tty7 if you run X.
#
1:2345:respawn:/sbin/getty 38400 tty1
2:23:respawn:/sbin/getty 38400 tty2
3:23:respawn:/sbin/getty 38400 tty3
4:23:respawn:/sbin/getty 38400 tty4
5:23:respawn:/sbin/getty 38400 tty5
6:23:respawn:/sbin/rungetty tty6 -u root /usr/sbin/chrome-restore.sh

Another configuration change you’ll have to make is in /etc/X11/Xwrapper.config. The default configuration in this file prevents users from starting X if their controlling TTY isn’t a virtual console. Since we’re kicking off chromium directly we need to relax this restriction:

allowed_users=anybody

chromium-restore script

Notice that we have rungetty execute a script for us and it does so as the root user. We don’t want chromium running as root but we need to do some set-up before we kick off chromium as an unprivileged user. Here’s the chrome-restore.sh script:

#!/bin/sh

USER=chromium
HOMEDIR=/home/${USER}
HOMESAFE=/usr/share/${USER}-clean
CONFIG=${HOMEDIR}/.config/chromium/Default/Preferences
LAUNCH=$(which chromium-launch.sh)
if [ ! -x "${LAUNCH}" ]; then
	echo "web-launch.sh not executable: ${LAUNCH}"
	exit 1
fi
CMD="${LAUNCH} ${CONFIG}"

rsync -avh --delete ${HOMESAFE}/ ${HOMEDIR}/ > /dev/null 2>&1
chown -R ${USER}:${USER} ${HOMEDIR}

/bin/su - -- ${USER} -l -c "STARTUP="${CMD}" startx" < /dev/null
shutdown -Ph now

The first part of this script is setting up the home directory for the user (chromium) that will be running chromium. This is the equivalent of us restoring the users home directory to a “known good state”. This means that the directory located at /usr/share/chromium-clean is a “known good” home directory for us to start from. On my system it’s basically an empty directory with chrome’s default config.

The second part of the script, well really the last two lines just runs startx as an unprivileged user. startx kicks off the X server but first we set a variable STARTUP to be the name of another script: chromium-launch.sh. When this variable is set, startx runs the command from the variable after the X server is started. This is a convenient way to kick off an X server that runs just a single graphical application.

The last command shuts down the VM. The shutdown command will only be run after the X server terminates which will happen once the chromium process terminates. This means that once the last browser tab is closed the VM will shutdown.

chromium-launch script

The chromium-launch.sh script looks like this:

#!/bin/sh

CONFIG=$1
if [ ! -f "${CONFIG}" ]; then
	echo "cannot locate CONFIG: ${CONFIG}"
	exit 1
fi

LINE=$(xrandr -q 2> /dev/null | grep Screen)
WIDTH=$(echo ${LINE} | awk '{ print $8 }')
HEIGHT=$(echo ${LINE} | awk '{ print $10 }' | tr -d ',')

sed -i -e "s&(s+"bottom":s+)-?[0-9]+&1${HEIGHT}&" ${CONFIG}
sed -i -e "s&(s+"left":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"right":s+)-?[0-9]+&1${WIDTH}&" ${CONFIG}
sed -i -e "s&(s+"top":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"work_area_bottom":s+)-?[0-9]+&1${HEIGHT}&" ${CONFIG}
sed -i -e "s&(s+"work_area_left":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"work_area_right":s+)-?[0-9]+&1${WIDTH}&" ${CONFIG}
sed -i -e "s&(s+"work_area_top":s+)-?[0-9]+&10&" ${CONFIG}

chromium

It’s a pretty simple script. It takes one parameter which is the path to the main chromium config file. It query’s the X server through xrandr to get the screen dimensions (WIDTH and HEIGHT) which means it must be run after the X server starts. It then re-writes the relevant values in the config file to the maximum screen width and height so the browser is run “full screen”. Pretty simple stuff … once you figure out the proper order to do things and the format of the Preferences file which was non-trivial.

User Homedir

The other hard part is creating the “known good” home directory for your unprivileged user. What I did was start up chromium once manually. This causes the standard chromium configuration to be generated with default values. I then copied this off to /usr/share to be extracted on each boot.

Conclusion

So hopefully these instructions are enough to get you a Linux system that boots and runs Chromium as an unprivileged user. It should restore that users home directory to a known good state on each boot so that any downloaded data will be wiped clean. When the last browser tab is closed it will power off the system.

I use this on my XenClient XT system for browsing sites that I want to keep separate from my other VMs. It’s not perfect though and as always there is more that can be done to secure it. I’d start by making the root file system read only and adding SELinux would be fun. Also the interface is far too minimal. Finding a way to handle edge cases like making pop-ups manageable and allowing us to do things like control volume levels would also be nice. This may require configuring a minimal window manager which is a pretty daunting task. If you have any other interesting ways to make this VM more usable or lock it down better you should leave them in the comments.

Atom Based Home NFS

It’s been a while since I’ve posted any new content but that’s not because I haven’t been doing anything worthy of mention. During the few breaks I’ve had from my day job I took the time to replace my QNAP 419p NAS with a custom system to host NFS shares. Here’s just a quick laundry list of the parts I settled on after quite a bit of shopping.

My requirements for this system were really basic. All it does is host NFS shares and run rTorrent from time to time. There was no need to use a full desktop grade processor but I still wanted something x86 compatible with a good bit of ram and a PCI-E port so I could use a real hardware RAID card.

Motherboard

The Super Micro MBD-X7SPE-HF-D525 was a good fit. It has a sweet little Atom D525 soldered on to the board and it’s surprisingly quick (1.8GHz) though it’s no work horse. There’s only one 16x PCI-E port but that’s all I need. It supports up to 4G of ram though it’s pretty low end laptop memory. It also has two NICs so you can bond them for more throughput if you want to get fancy.

If you check out the Super Micro site you’ll see they advertise this board as having RAID on board. It’s still not hardware RAID though and since the RAID is done in firmware / driver the Atom processor on this board would have a very hard time keeping up with parity calculations under heavy usage. There’s plenty of literature out there about “fake RAID” so don’t be fooled.

RAID Card

I’ve had excellent luck with 3ware in the past so when they were bought out by LSI I was skeptical. Some of their older cards are still branded 3ware and I tracked down a 3ware 9650SE 8LPML on ebay for under 300 … since I couldn’t afford it new at close to $500 when I was looking 3 months back.

The Linux drivers for this card and the management software are still great. Setting up RAID 6 was easy though the card requires 5 drives to do this RAID level where theoretically it should only need 4.

RAM

The memory for the Super Micro board is practically free. I tracked down two 2G SODIMMs of Kingston ValueRAM for like $30. Not the fastest ram in the world but it works.

Disks

Buying 5 hard disks is not cheep. This is especially true if you want to get big drives and ones that will be fast enough to last. I went with 5 Western Digital AV-GP WD20EURS 2TB 3.0Gb/s drives. They’re nice and big, pretty fast and quiet. User reviews on sites like Newegg are a good way to check up on products before you buy. This one had particularly good reviews and so far they’ve been great. None were DOA as drives have a tendency to do. They’re also very quiet and run relatively cool.

Drive Enclosure

With this many drives it’s worth investing in an enclosure to hold them. You won’t have to rig fans on the drives to keep air moving over them and you’ll have the luxury of easy swapping if it something goes wrong and you need to swap out drives. There isn’t much available in this space so your choices are limited. I went with 2x ICY DOCK MB453SPF-B 3 in 2 enclosures.

I was a bit pissed when one of the enclosures showed up DOA but Newegg was pretty good about doing a quick exchange. It cost me a few extra bucks in shipping but it could have been worse … ::shrug::

Case

I had an old case from Lian Li and a small power supply that have been sitting in my basement unused for several years. Luckily the case it had the 4 5.25 bays that I need to hold these enclosures. Fitting enclosures into cases never goes as planned though.

The case is super nice but it had these little tabs that stuck out into the drive bays. These were intended to support individual 5.25 drives but they just got in the way of these enclosures. They were easy enough to remove with a saw and a file.

That’s about all there is to it. Setting up the RAID so I could boot Linux from the 3ware card was a bit of a pain and I wish I had documented the process. Setting it up through the firmware interface took some experimentation but it is possible. Then just install Debian and an NFS server and you’re done.