On my Xen systems I’ve run pretty much 99% of my Linux guests paravirtualized (PV). Mostly this was because I’m lazy. Setting up a PV guest is super simple. No need for partitions, boot loaders or any of that complicated stuff. Setting up a PV Linux guest is generally as simple as setting up a chroot. You don’t even need to install a kernel.
There’s been a lot of work over the past 5+ years to add stuff to processors and Xen to make the PV extensions to Linux unnecessary. After checking out a presentation by Stefano Stabilini a few weeks back I decided I’m long overdue for some HVM learning. Since performance of HVM guests is now better than PV for most cases it’s well worth the effort.
This post will serve as my documentation for setting up HVM Linux guests. My goal was to get an HVM Linux installed using typical Linux tools and methods like LVM and chroots. I explicitly was trying to avoid using RDP or anything that isn’t a command-line utility. I wasn’t completely successful at this but hopefully I’ll figure it out in the next few days and post an update.
Disks and Partitions
Like every good Linux user LVMs are my friend. I’d love a more flexible disk backend (something that could be sparsely populated) but blktap2 is pretty much unmaintained these days. I’ll stop before I fall down that rabbit hole but long story short, I’m using LVMs to back my guests.
There’s a million ways to partition a disk. Generally my VMs are single-purpose and simple so a simple partitioning scheme is all I need. I haven’t bothered with extended partitions as I only need 3. The layout I’m using is best described by the output of sfdisk
:
# partition table of /dev/mapper/myvg-hvmdisk unit: sectors /dev/mapper/myvg-hvmdisk1 : start= 2048, size= 2097152, Id=83 /dev/mapper/myvg-hvmdisk2 : start= 2099200, size= 2097152, Id=82 /dev/mapper/myvg-hvmdisk3 : start= 4196352, size= 16775168, Id=83 /dev/mapper/myvg-hvmdisk4 : start= 0, size= 0, Id= 0
That’s 3 partitions, the first for /boot
, the second for swap
and the third for the rootfs. Pretty simple. Once the partition table is written to the LVM volume we need to get the kernel to read the new partition table to create devices for these partitions. This can be done with either the partprobe
command or kpartx
. I went with kpartx:
$ kpartx -a /dev/mapper/myvg-hvmdisk
After this you’ll have the necessary device nodes for all of your partitions. If you use kpartx
as I have these device files will have a digit appended to them like the output of sfdisk above. If you use partprobe
they’ll have the letter ‘p’ and a digit for the partition number. Other than that I don’t know that there’s a difference between the two methods.
Then get the kernel to refresh the links in /dev/disk/by-uuid
(we’ll use these later):
$ udevadm trigger
Now we can set up the filesystems we need:
$ mkfs.ext2 /dev/mapper/myvg-hvmdisk1 $ mkswap /dev/mapper/myvg-hvmdisk2 $ mkfs.ext4 /dev/mapper/myvg-hvmdisk3
Install Linux
Installing Linux on these partitions is just like setting up any other chroot. First step is mounting everything. The following script fragment
# mount VM disks (partitions in new LV) if [ ! -d /media/hdd0 ]; then mkdir /media/hdd0; fi mount /dev/mapper/myvg-hvmdisk3 /media/hdd0 if [ ! -d /media/hdd0/boot ]; then mkdir /media/hdd0/boot; fi mount /dev/mapper/myvg-hvmdisk1 /media/hdd0/boot # bind dev/proc/sys/tmpfs file systems from the host if [ ! -d /media/hdd0/proc ]; then mkdir /media/hdd0/proc; fi mount --bind /proc /media/hdd0/proc if [ ! -d /media/hdd0/sys ]; then mkdir /media/hdd0/sys; fi mount --bind /sys /media/hdd0/sys if [ ! -d /media/hdd0/dev ]; then mkdir /media/hdd0/dev; fi mount --bind /dev /media/hdd0/dev if [ ! -d /media/hdd0/run ]; then mkdir /media/hdd0/run; fi mount --bind /run /media/hdd0/run if [ ! -d /media/hdd0/run/lock ]; then mkdir /media/hdd0/run/lock; fi mount --bind /run/lock /media/hdd0/run/lock if [ ! -d /media/hdd0/dev/pts ]; then mkdir /media/hdd0/dev/pts; fi mount --bind /dev/pts /media/hdd0/dev/pts
Now that all of the mounts are in place we can debootstrap
an install into the chroot:
$ sudo debootstrap wheezy /media/hdd0/ http://http.debian.net/debian/
We can then chroot
to the mountpoint for our new VMs rootfs and put on the finishing touches:
$ chroot /media/hdd0
Bootloader
Unlike a PV guest, you’ll need a bootloader to get your HVM up and running. A first step in getting the bootloader installed is figuring out which disk will be mounted and where. This requires setting up your fstab
file.
At this point we start to run into some awkward differences between our chroot and what our guest VM will look like once it’s booted. Our chroot reflects the device layout of the host on which we’re building the VM. This means that the device names for these disks will be different once the VM boots. On our host they’re all under the LVM /dev/mapper/myvg-hvmdisk
and once the VM boots they’ll be something like /dev/xvda
.
The easiest way to deal with this is to set our fstab up using UUIDs. This would look something like this:
# / was on /dev/xvda3 during installation UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 errors=remount-ro 0 1 # /boot was on /dev/xvda1 during installation UUID=yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 defaults 0 2 # swap was on /dev/xvda2 during installation UUID=zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz none swap sw 0 0
By using UUIDs we can make our fstab accurate even in our chroot.
After this we need to set up the /etc/mtab
file needed by lots of Linux utilities. I found that when installing Grub2
I needed this file in place and accurate.
Some data I’ve found on the web says to just copy or link the mtab
file from the host into the chroot but this is wrong. If a utility consults this file to find the device file that’s mounted as the rootfs it will find the device holding the rootfs for the host, not the device that contains the rootfs for our chroot.
The way I made this file was to copy it off of the host where I’m building the guest VM and then modify it for the guest. Again I’m using UUIDs to identify the disks / partitions for the rootfs
and /boot
to keep from having data specific to the host platform leak into the guest. My final /etc/mtab
looks like this:
rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=253371,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=203892k,mode=755 0 0 /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx / ext4 rw,relatime,errors=remount-ro,user_xattr,barrier=1,data=ordered 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=617480k 0 0 /dev/disk/by-uuid/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy /boot ext2 rw,relatime,errors=continue,user_xattr,acl 0 0
Finally we need to install both a kernel and the grub2 bootloader:
$ apt-get install linux-image-amd64 grub2
Installing Grub2
is a pain. All of the additional disks kicking around in my host confused the hell out of the grub installer scripts. I was given the option to install grub on a number of these disks and none were the one I wanted to install it on.
In the end I had to select the option to not install grub on any disk and fall back to installing it by hand:
$ grub-install --force --no-floppy --boot-directory=/boot /dev/disk/by-uuid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
And then generate the grub config file:
update-grub
If all goes well the grub boot loader should now be installed on your disk and you should have a grub config file in your chroot /boot
directory.
Final Fixups
Finally you’ll need to log into the VM. If you’re confident it will boot without you having to do any debugging then you can just configure the ssh server to start up and throw a public key in the root homedir. If you’re like me something will go wrong and you’ll need some boot logs to help you debug. I like enabling the serial emulation provided by qemu for this purpose. It’ll also allow you to login over serial which is convenient.
This is pretty standard stuff. No paravirtual console through the xen console driver. The qemu emulated serial console will show up at ttyS0 like any physical serial hardware. You can enable serial interaction with grub by adding the following fragment to /etc/default/grub
:
GRUB_TERMINAL_INPUT=serial GRUB_TERMINAL_OUTPUT=serial GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"
To get your kernel to log to the serial console as well set the GRUB_CMDLINE_LINUX
variable thusly:
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,38400n8"
Finally to get init
to start a getty with a login prompt on the console add the following to your /etc/inittab
:
T0:23:respawn:/sbin/getty -L ttyS0 38400 vt100
Stefano Stabilini has done another good write-up on the details of using both the PV and the emulated serial console here: http://xenbits.xen.org/docs/4.2-testing/misc/console.txt. Give it a read for the gory details.
Once this is all done you need to exit the chroot, unmount all of those bind mounts and then unmount your boot and rootfs from the chroot directory. Once we have a VM config file created this VM should be bootable.
VM config
Then we need a configuration file for our VM. This is what my generic HVM template looks like. I’ve disabled all graphical stuff: sdl=0
, stdvga=0
, and vnc=0
, enabled the emulated serial console: serial='pty'
and set xen_platform_pci=1
so that my VM can use PV drivers.
The other stuff is standard for HVM guests and stuff like memory
, name
, and uuid
that should be customized for your specific installation. Things like uuid
and the mac
address for your virtual NIC should be unique. There are websites out there that will generate these values. Xen has it’s own prefix for MAC addresses so use a generator to make a proper one.
builder = "hvm" memory = "2048" name = "myvm" uuid = "uuuuuuuu-uuuu-uuuu-uuuu-uuuuuuuuuuuu" vcpus = 1 cpus = '0-7' pae=1 acpi=1 apic=1 boot='c' xen_platform_pci=1 sdl=0 vnc=0 vnclisten='0.0.0.0' stdvga=0 serial='pty' disk = [ '/dev/ssdraid1/wwwhome,raw,xvda,rw' ] vif = [ 'mac=XX:XX:XX:XX:XX:XX,model=e1000', ]
Boot
Booting this VM is just like booting any PV guest:
xl create -c /etc/xen/vms/myvm.cfg
I’ve included the -c
option to attach to the VMs serial console and ideally we’d be able to see grub and the kernel dump a bunch of data as the system boots.
TODO
I’ve tested these instructions twice now on a Debian Wheezy system with Xen 4.3.1 installed from source. Both times Grub installs successfully but fails to boot. After enabling VNC for the VM and connecting with a viewer it’s apparent that the VM hangs when SEABIOS tries to kick off grub.
As a work-around both times I’ve booted the VM from a Debian rescue ISO, setup a chroot much like in these instructions (the disk is now /dev/xvda though) and re-installed Grub. This does the trick and rebooting the VM from the disk now works. So I can only conclude that either something from my instructions w/r to installing Grub is wrong but I think that’s unlikely as they’re confirmed from numerous other “install grub in a chroot” instructions on the web.
The source of the problem is speculation at this point. Part of me wants to dump the first 2M of my disk both after installing it using these instructions and then again after fixing it with the rescue CD. Now that I think about it the version of Grub installed in my chroot is probably a different version than the one on the rescue CD so that could have something to do with it.
Really though, I’ll probably just install syslinux and see if that works first. My experiences with Grub have generally been bad any time I try to do something out of the ordinary. It’s incredibly complicated and generally I just want something simple like syslinux to kick off a very simple VM.
I’ll post an update once I’ve got to the bottom of this mystery. Stay tuned.