You do not have sufficient permissions to access this page

I’ve been working on a set of scripts to backup the wordpress instances on my server. While structuring these scripts I realized that I hadn’t structured my WP installs consistently. I had played around with using table prefixes and hosting multiple WP instances in a single database but eventually I just broke them out into unique databases for simplicity.

The table prefixes persisted however and while I was poking around I decided to rename the tables dropping the prefixes. What I didn’t count on is that some values in the usermeta and options tables are prepended with the table prefix from the wp-config.php. Without fixing up these values in the databse your site will function normally, but the admin interface will only show an error message:

You do not have sufficient permissions to access this page.

The database values that need to be fixed up are some entries in the meta_key column of the usermeta table and the option_name column of the options table. Let’s assume that your prefix is pre_, that you’ve already removed the prefix from your database and that you now want to fix up these values. The following SQL commands will remove your old prefix from these tables:

UPDATE usermeta SET meta_key = REPLACE(meta_key,'pre_','');
UPDATE options SET option_name = REPLACE(option_name,'pre_','');

References

http://www.tech-evangelist.com/2010/02/06/wordpress-error-sufficient-permissions/
http://wordpress.org/support/topic/changed-table-prefix-got-insufficient-permissions-error

Debian Squeeze Bluetooth Headset

Working out of my “home office” these past two weeks I’ve found a few short comings in my setup. While on my first teleconference last week I spent an hour holding my cellphone to my head. Miserable. I never thought I’d miss a land line / speaker phone. Easy problem to solve though, just get a bluetooth headset on ebay. Top of the line will cost you $80 new.

Getting this to work with my cellphone was just a matter of pushing buttons. Nothing interesting. Getting it to work on my laptop (EliteBook 2560p running Debian Squeeze) was a bit of a trick. I’m a hopeless minimalist and I just want a simple GUI to manage my bluetooth devices and a new audio device to show up in alsamixer (1). The GUI can be either blueman or gnome-bluetooth. For the first time in a while I actually liked the gnome app better so that’s what I went with. Both worked fine for pairing the device.

Getting the Linux sound system to pick up the new devices was beyond me. The worst part was that all resources unearthed through web searches always pointed to stuff that didn’t work, or was a reference to the bluez wiki which has been down for 2 years. I tried a few times to hack a .asoundrc and just ended up feeling stupid. Unless you’ve got a day to burn don’t bother with this approach.

The solution I found was actually very simple once I found it. Turns out if you install pulseaudio it will do all the hard stuff for you (which I like). I only came to this solution after stumbling across a post on ask.debian.net: http://ask.debian.net/questions/how-to-make-bluetooth-headset-working-with-ekiga-in-squeeze. Nice fix that works perfectly. I had my bluetooth headset working on Squeeze in a matter of minutes after installing PulseAudio.

EliteBook 2560p Intel 82579LM Debian Squeeze Install

Started with a new employer (Citrix) today. Naturally my first task of setting up a development system was more work than I wanted it to be. Turns out the EliteBook 2560p has Intel 83579LM network hardware and the Debian Squeeze e1000 driver predates it. Using ‘testing’ is always an option but not a very stable / appealing one.

The 2.6.38 kernel and drivers have been backported to squeeze so all that’s really needed is an installer with this kernel / drivers. I didn’t know until I’d burned a good half hour searching around the web that there are unofficial Squeeze installers complete with backports: http://cdimage.debian.org/cdimage/unofficial/backports/squeeze/. One of those images is all you’ll need to get Squeeze running on a system with Intel 82579LM network hardware.

Validating IP Addresses

UPDATE: Added terminating ‘$’ in ipv4 regex as noted in comment from raorn.

I’ve been working on a fix to a system script that passes around and manipulates IP addresses. With IPv6 becoming more prevalent this script must work with IPv6 addresses not just v4. While working on this and digging around the web I ran across some stuff that I think is worth sharing.

The first thing I always do when I’m working with a new data format is writing a script / function that can be used to validate it. Here’s what I came up with for IPv4 and IPv6.

IPv4 Regex

With IPv4 this pretty boring and can be done with a one line regular expression (regex) that’s all over the web. I clean things up a bit by using shell variables but the regex should be clear:

#!/bin/sh
QUAD="25[0-5]|2[0-4][0-9]|[0-1]?[0-9]?[0-9]"
is_ipv4 () {
    echo $1 | grep --silent "^(${QUAD})(.(${QUAD})){3}$"
    if [ $? -eq 0 ]; then
        return 1
    fi
    return 0
}

is_ipv4 $1
if [ $? -eq 1 ]; then
    exit 0
else
    echo "Invalid IPv4 address." >&2
    exit 1
fi

Nothing earth shattering.

IPv6 Regex

Working with IPv6 addresses is a bit more complex. To compensate for the larger addresses size when representing IPv6 addresses in text, the RFC recommends a canonical textual representation with rules that allow for compression (called “zero folding”). Addresses represented in this compressed format are more difficult to validate with just one regex and the regex is much longer:

#!/bin/sh
WORD="[0-9A-Fa-f]{1,4}"
# flat address, no compressed words
FLAT="^${WORD}(:${WORD}){7}$"
# ::'s compressions excluding beginning and end edge cases
COMP2="^(${WORD}:){1,1}(:${WORD}){1,6}$"
COMP3="^(${WORD}:){1,2}(:${WORD}){1,5}$"
COMP4="^(${WORD}:){1,3}(:${WORD}){1,4}$"
COMP5="^(${WORD}:){1,4}(:${WORD}){1,3}$"
COMP6="^(${WORD}:){1,5}(:${WORD}){1,2}$"
COMP7="^(${WORD}:){1,6}(:${WORD}){1,1}$"
# trailing :: edge case, includes case of only :: (all 0's)
EDGE_TAIL="^((${WORD}:){1,7}|:):$"
# leading :: edge case
EDGE_LEAD="^:(:${WORD}){1,7}$"
is_ipv6 () {
    echo $1 | grep --silent "(${FLAT})|(${COMP2})|(${COMP3})|(${COMP4})|(${COMP5})|(${COMP6})|(${COMP7})|(${EDGE_TAIL})|(${EDGE_LEAD})"
    if [ $? -eq 0 ]; then
        return 1
    fi
    return 0
}

is_ipv6 $1
if [ $? -eq 1 ]; then
    exit 0
else
    echo "Invalid IPv6 address: $1" >&2
    exit 1
fi

Folks on the web have got it right too and I definitely took a queue from Vernon Mauery. I got a bit caught up in the differences between addresses from RFC4291 and the recommendations in RFC5952. The prior allows for zero folding of single 16-bit 0 fields while the latter discourages this. As the “robustness principle” dictates this validation script will identify addresses with zero folded single 16-bit 0 fields as valid but tools producing addresses should not.

I haven’t taken on any of the weirdness that are mixed hexadecimal and dot decimal notations … those will remain for the interested reader.

Ethernet Bonding on Debian Squeeze

Spent a few minutes searching for a howto for setting up ethernet interface bonding on a new file server I’m building today. Nothing special but I found a bunch that aren’t that great … I know, welcome to the internet right? But I did find one that’s awesome from tuxhelp.org.

My final config went like this:

echo -e "bondingnmii" | sudo tee -a /etc/modules

With an /etc/network/interfaces file that looks like this:

auto lo bond0
iface lo inet loopback

iface bond0 inet dhcp
    bond_mode balance-rr
    bond_miimon 100
    bond_downdelay 200
    bond_updelay 200
    slaves eth0 eth1

What was lacking in all other (even Debian specific) howto’s is that they always use direct invocation of ifenslave and pass options to the bonding driver manually. IMHO it’s so much nicer to use the facilities built in to ifup like the slaves option instead of using something like:

up /sbin/ifenslave bond0 eth0 eth1

That said I haven’t had much luck finding documentation for options like this specific to a driver and how to use them in the interfaces file. Given the above example I can guess but I’m looking for a definitive source … Anyone out there know?

Exim + Sieve issues

I spent much longer than I’d like to admit moving my mail server today. The Debian exim4 package is very easy to configure and setting up TLS and authentication is a snap with the help of a very good Debian Administration article. Also I’ve had to tweak the address_file transport to support Sieve and the fileinto action.

What I’m writing now is mostly for my own benefit so I don’t have to look this same crap up in a few years when I move my mail server again:

TLS and Auth

The howto above uses the standard exim auth from /etc/exim4/passwd but if you want to use something like the courier-authdaemon all you need to do is go further down in /etc/exim4/conf.d/auth/30_exim4-config_examples to uncomment a later section:

plain_courier_authdaemon:
login_courier_authdaemon:

All authentication should be done over TLS and enabling this only requires a private key, a certificate and adding this file: /etc/exim4/main/000_localmacros:

MAIN_TLS_ENABLE = yes
MAIN_TLS_CERTIFICATE = /etc/ssl/certs/example.org.cert
MAIN_TLS_PRIVATEKEY  = /etc/exim4/example.org.key

The Debian exim daemon is running as the user Debian-exim so it won’t be able to access files in /etc/ssl/private. You can either keep your secret key in /etc/exim4 as I’ve done above or add the Debian-exim user to the daemon group and make the group for /etc/ssl/private daemon. Either is reasonable but you’ll have to add Debian-exim to the daemon group anyway so it can use the authdaemon socket.

At this point you should check to be sure your mail isn’t an open relay and there are a number of tools available to test this. Some are websites that you can simply enter the IP/domain name of your mail server. Others are tools like swaks where you can test this for yourself. A good example of using swaks for testing exim4 can be found here.

Exim4 and Sieve

Finally it seems like the Exim4 package on Debian Squeeze may have a bug when it comes to delivering mail to users with Sieve filters in their .forward file. I kept getting an error stating:

R=userforward T=address_file defer (-21): appendfile: file or directory name "inbox" is not absolute

To debug this a cheetsheet for getting Exim to do your bidding essential. The best one I could find is here.

There are a number of mailing list posts out there discussing similar errors but none seemed to fix my problem. Basically the error message means that the appendfile transport isn’t able to figure out what the “inbox” from a sieve filter should be when converted to a file/directory name. I’m using maildir in a users home directory so I spent a few hours poking around that part of the configuration to no avail.

Eventually, in my old configuration, I found a patch to the address_file transport to help it figure out what “inbox” is:

--- a/conf.d/transport/30_exim4-config_address_file	2005-02-19 05:25:59.000000000 -0500
+++ b/conf.d/transport/30_exim4-config_address_file	2011-07-24 21:52:33.270494409 -0400
@@ -8,4 +8,8 @@
   delivery_date_add
   envelope_to_add
   return_path_add
+  directory = ${if eq{$address_file}{inbox} 
+                            {$home/Maildir/new} 
+                            {$home/Maildir/.${sg{$address_file}{^inbox[.]}{}}/new}
+                        }

Here comes the disclaimer: I’m no Exim hacker and I can barely figure out what this does. I found it two years ago when I was setting up my mail server and had to get Sieve filters working. When I moved this mail server today I upgraded from Lenny to Squeeze and figured this may have been fixed. It wasn’t though so I had to dig through my old configs to find it again.

Adding an Online Spare to p400i RAID Controller

I’ve had a HP DL360 hosting my blog, MySQL, SMTP and IMAP servers for a while now. The server’s been great and I’ve always intended to add an online spare (a.k.a. hot spare) for a little piece of mind. I’ve finally got around to it and it turns out that the syntax for using the CLI configuration tool from HP is a bit cryptic so this is a few notes on how I got the job done.

Tools & Documentation

Tracking down the tools for the job was a bit of a pain. To save you some time the documentation you want is here: Controller Reference Guide

HP actually supports the configuration tool (hpacucli) for Debian and they provide a package through the Proliant Support Pack apt repository. You can add the repository to your sources.list file with the following line:

deb http://downloads.linux.hp.com/SDR/downloads/ProLiantSupportPack/ lenny/current non-free

A Few Basic Commands

The general syntax for the hpacucli command is something like [parameter=value]. But to do something useful like dump out all of the disks on the controller (I’ve only got one on my server) you’ll do something like this:

flihp@server:~$ sudo hpacucli controller slot=0 physicaldrive all show

Smart Array P400i in Slot 0 (Embedded)

   array A

      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 32.0 GB, OK)

   array B

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)

   unassigned

      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 72 GB, OK)

So there are 6 disks attached to the controller. For this task I’m concerned with those that make up array B and the unassigned disk. This disk is the one that I wish to add as an online spare to array B. But first lets dump so info about array B:

flihp@server:~$ sudo hpacucli controller slot=0 array B show

Smart Array P400i in Slot 0 (Embedded)

   Array: B
      Interface Type: SAS
      Unused Space: 0 MB
      Status: OK

That’s not very hlepful … Let’s try something else. The documentation says something about logical drives so lets try that:

flihp@www:~$ sudo hpacucli controller slot=0 array B logicaldrive all show

Smart Array P400i in Slot 0 (Embedded)

   array B

      logicaldrive 2 (136.7 GB, RAID 1+0, OK)

That’s what I wanted to see: the logical drives that are on array B. As you can see this is 136.7 GB and configured as RAID 1+0. This makes sense since it is a RAID 1+0 made up of 4x72GB SAS drives. I’ve only allocated one logical drive on this array because I’m using LVM to create logical volumes in software. This is just how I like to do things. It may very well be faster and just as convenient to allocate more logical drives at the controller level but that’s another debate for another time. For now let’s stay focused on adding the unallocated disk as an online spare to array B.

Assigning an Online Spare

The specific syntax is spelled out in the manual I linked above. We’ve gathered all the necessary data for the command above and it looks like this:

flihp@server:~$ sudo hpacucli controller slot=0 array B add spares=2I:1:6

The target is controller slot=0 array B. This is the identifier for the array discussed above. The command is add spares which is pretty self explanatory. The last part is the identifier for the physical device we’re adding as a spare. If you scroll a bit up you’ll see that I got this identifier by asking the controller to dump info on all attached physical drives. If you want to see which drive this is on your system you can actually make the drive light on it flash which I thought was pretty cool (see the manual for details).

Executing the above command produced no output so you can either assume everything went as planned or you can check up on the array that we just modified:

flihp@www:~$ sudo hpacucli controller slot=0 array B physicaldrive all show

Smart Array P400i in Slot 0 (Embedded)

   array B

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 72 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 72 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 72 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 72 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 72 GB, OK, spare)

Sweet! Now that extra drive is lined up to be a fail-over if one of the other drives in the array fails.

madwifi on lenny router

For the past two months my day job has taken over my life. As always, during the height of my insane work schedule my wireless router started acting up. I guess I needed something to break so I had an excuse to take a much needed break from work. I’d pretty much forgotten I even had a wireless access point because it had been so long since it required any attention. I learned a few things that are even worth mentioning here in the process

First off the router I had to fix wasn’t an off-the-shelf 802.11 box. It was actually my first PC Engines router built on the old wrap1e103. It ran Debian Sarge and had an Atheros AR5413 802.11abg which was hot shit when I bought it.

Needless to say there wasn’t any reason to try to fix this system. It’s old, tired and wasn’t experiencing any specific problems except running really slow every once in a while. Best bet was to upgrade.

Almost a year ago I blogged about a VPN gateway that I built on a new PCEngines ALIX platform. I drafted that system to replace this dying Sarge box. Upgrading to Lenny and faster hardware was long over due. It wasn’t all smooth sailing though.

Turns out the madwifi drivers are in flux and the drivers available on Lenny are pretty unstable. Luckily a little Googling turned up someone who already did the research for me and solved this problem! Their fix did the trick and likely saved me a full night of scouring the interwebs. Following these directions the madwifi drivers came up fine in hostap mode and were offering DNS and DHCP through dnsmasq within minutes.

I even had enough time left over to set up the VPN so I can login to my home network while I’m on the road. Since I’ve been traveling every other week all summer this is going to come in handy. The iptables rules for this got pretty interesting so I’ll probably post something about that in the near future.

Xen Network Driver Domain: How

In my last post I went into the reasons why exporting the network hardware from dom0 to an unprivileged driver domain is good for security. This time the “how” is our focus. The documentation out there isn’t perfect and it could use a bit of updating so expect to see a few edits to the relevant Xen wiki page [1] in the near future.

Basic setup

How you configure your Xen system is super important. The remainder of this post assumes you’re running the latest Xen from the unstable mercurial repository (4.0.1) with the latest 2.6.32 paravirt_ops kernel [2] from Jeremy Fitzhardinge’s git tree (2.6.32.16). If you’re running older versions of either Xen or the Linux kernel this may not work so you should consider updating.

For this post I’ll have 3 virtual machines (VMs).

  1. the administrative domain (dom0) which is required to boot the system
  2. an unprivileged domain (domU) that we’ll call “nicdom” which is short for network interface card (NIC) domain. You guessed it, this will become our network driver domain.
  3. another unprivileged domain (domU or client domain) that will get its virtual network interface from nicdom

I don’t really care how you build your virtual machines. Use whatever method you’re comfortable with. Personally I’m a command line junkie so I’ll be debootstrapping mine on LVM partitions as minimal Debian squeeze/sid systems running the latest pvops kernel. Initially the configuration files used to start up these two domUs will be nearly identical:
nicdom:

kernel="/boot/vmlinuz-2.6.32.16-xen-amd64"
ramdisk="/boot/initrd.img-2.6.32.16-xen-amd64"
memory=256
name="nicdom"
disk=["phy:/dev/lvgroup/nicdom_root,xvda,w"]
root="/dev/xvda ro"
extra="console=hvc0 xencons=tty"

client domain

kernel="/boot/vmlinuz-2.6.32.16-xen-amd64"
ramdisk="/boot/initrd.img-2.6.32.16-xen-amd64"
memory=1024
name="client"
disk=[
    "phy:/dev/lvgroup/client_root,xvda,w",
    "phy:/dev/lvgroup/client_swap,xvdb,w",
]
root="/dev/xvda ro"
extra="console=hvc0 xencons=tty"

I’ve given the client a swap partition and more ram because I intend to turn it into a desktop. The nicdom (driver domain) has been kept as small as possible since it’s basically a utility that won’t have many logins. Obviously there’s more to it than just load up these config files but installing VMs is beyond the scope of this document.

PCI pass through

The first step in configuring the nicdom is passing the network card directly through to it. The xen-pciback driver is the first step in this process. It hides the PCI device from dom0 which will later allow us to bind the device to a domU through configuration when we boot it using xm

There’s two ways to configure the xen-pciback driver:

  1. kernel parameters at dom0 boot time
  2. dynamic configuration using sysfs

xen-pciback kernel parameter

The first is the easiest so we’ll start there. You need to pass the kernel some parameters to tell it which PCI device to pass to the xen-pciback driver. Your grub kernel line should look something like this:

module /vmlinuz-2.6.32.16-xen-amd64 /vmlinuz-2.6.32.16-xen-amd64 root=/dev/something ro console=tty0 xen-pciback.hide=(00:19.0) intel_iommu=on

The important part here is the xen-pciback.hide parameter that identifies the PCI device to hide. I’m using a mixed Debian squeeze/sid system so getting used to grub2 is a bit of a task. Automating the configuration through grub is outside the scope of this document so I’ll assume you have a working grub.cfg or a way to build one.

Once you boot up your dom0 you’ll notice that lspci still shows the PCI device. That’s fine because the device is still there, it’s just the kernel is ignoring it. What’s important is that when you issue an ip addr you don’t have a network device for this PCI device. On my system all I see is the loopback (lo) device, no eth0.

dynamic configuration with sysfs

If you don’t want to restart your system you can pass the network device to the xen-pciback driver dynamically. First you need to unload all drivers that access the device: modprobe -r e1000e. This is the e1000e driver in my case.

Next we tell the xen-pciback driver to hide the device by passing it the device address:

echo "0000:00:19.0" | sudo tee /sys/bus/pci/drivers/pciback/new_slot
echo "0000:00:19.0" | sudo tee /sys/bus/pci/drivers/pciback/bind

Some of you may be thinking “what’s a slot” and I’ve got no good answer. If someone reading this knows, leave me something in the comments if you’ve got the time.

passing pci device to driver domain

Now that dom0 isn’t using the PCI device we can pass it off to our nicdom. We do this by including the line:

pci=['00:19.0']

in the configuration file for the nicdom. We can pass more than one device to this domain by placing another address between the square brackets like so:

pci=['00:19.0', '03:00.0']

Also we want to tell Xen that this domain is going to be a network driver domain and we have to configure IOMMU:

netif="yes"
extra="console=hvc0 xencons=tty iommu=soft"

Honestly I’m not sure exactly what these last two configuration lines do. There are a number of mailing list posts giving a number of magic configurations that are required to get PCI passthrough to work right. These ones worked for me so YMMV. If anyone wants to explain please leave a comment.

Now when this domU boots we can lspci and we’ll see these two devices listed. Their address may be the same as in dom0 but this depends on how you’ve configured your kernel. Make sure to read the Xen wiki page for PCIPassthrough [4] as it’s quite complete.

Depending on how you’ve set up your nicdom you may already have some networking configuration in place. I’m partial to debootstrapping my installs on a LVM partition so I end up doing the network configuration by hand. I’ll dedicate a whole post to configuring the networking in the nicdom later. For now just get it working however you know how.

the driver domain

As much as we want to just jump in and make the driver domain work there’s still a few configurations that we need to run through first.

Xen split drivers

Xen split drivers exist in two halves. The backend of the driver is located in the domain that owns the physical device. Each client domain that is serviced by the backend has a frontend driver that exposes a virtual device for the client. This is typically referred to as xen split drivers [3].

The xen networking drivers exist in two halves. For our nicdom to serve its purpose we need to load the xen-netback driver along with the xen-evtchn and the xenfs. We’ve already discussed what the xen-netback driver so let’s talk about what the others are.

The xenfs driver will exposes some xen specific stuff form the kernel to user space through the /proc file system. Exactly what this “stuff” is I’m still figuring out. If you dig into the code for the xen tools (xenstored and the various xenstore-* utilities) you’ll see a number of references to files in proc. From my preliminary reading this is where a lot of the xenstore data is exposed to domUs.

The xen-evtchn is a bit more mysterious to me at the moment. The name makes me think it’s responsible for the events used for communication between backend and frontend drivers but that’s just a guess.

So long story short, we need these modules loaded in nicdom:

modprobe -i xenfs xen-evtchn xen-netback

In the client we need the xenfs, xen-evtchn and the xen-netfront modules loaded.

Xen scripts and udev rules

Just like the Xen wiki says, we need to install the udev rules and the associated networking scripts. If you’re like me you like to know exactly what’s happening though, so you may want to trigger the backend / frontend and see the events coming from udev before you just blindly copy these files over.

udev events

To do this you need both the nicdom and the client VM up and running with no networking configured (see configs above). Once their both up start udevadm monitor --kernel --udev in each VM. Then try to create the network front and backends using xm. This is done from dom0 with a command like:

xm network-attach client mac=XX:XX:XX:XX:XX:XX,backend=nicdom

I’ll let the man page for xm explain the parameters 🙂

In the nicdom you should see the udev events creating the backend vif:

KERNEL[timestamp] online   /devices/vif/-4-0 (xen-backend)
UDEV_LOG=3
ACTION=online
DEVPATH=/devices/vif-4-0
SUBSYSTEM=xen-backend
XENBUS_TYPE=vif
XENBUS_PATH=backend/vif/4/0
XENBUS_BASE_PATH=backend
script=/etc/xen/scripts/vif-nat
vif=vif4.0

There are actually quite a few events but this one is the most important mostly because of the script and vif values. script is how the udev rule configures the network interface in the driver domain and the vif tells us the new interface name.

Really we don’t care what udev events happend in the client since the kernel will just magically create an eth0 device like any other. You can configure it using /etc/network/interfaces or any other method. If you’re interested in which events are triggered in the client I recommend recreating this experiment for yourself.

Without any udev rules and scripts in place the xm network-attach command should fail after a time out period. If you’re into reading network scripts or xend log files you’ll see that xend is waiting for the nicdom to report the status of the network-attach in a xenstore variable:

DEBUG (DevController:144) Waiting for 0.
DEBUG (DevController:628) hotplugStatusCallback /local/domain/1/backend/vif/3/0/hotplug-status

installing rules, scripts and tools

Now that we’ve seen the udev events we want to install the rules for Xen that will wait for the right event and will then trigger the necessary script. From the udevadm output above we’ve seen that dom0 passes the script name through the udev event. This script name is actually configured in the xend-config.xsp file in dom0:

(vif-script vif-whatever)

You can use whatever xen networking script you want (bridge is likely the easiest).

So how to install the udev rules and the scripts? Well you could just copy them over manually (mount the nicdom partition in dom0 and literally cp them into place). This method got me in trouble though and this detail is omitted from the relevant Xen wiki page [1]. What I didn’t know is the info I just supplied above: that dom0 waits for the driver domain to report its status through the xenstore. The networking scripts that get run in nicdom report this status but they require some xenstore-* utilities that aren’t installed in a client domain by default.

Worse yet I couldn’t see any logging out put from the script indicating that it was trying to execute xenstore-write and failing because there wasn’t an executable by that name on it’s path. Once I tracked down this problem (literally two weeks of code reading and bugging people on mailing lists) it was smooth sailing. You can install these utilities by hand to keep your nicdom as minimal as possible. What I did was copy over the whole xen-unstable source tree to my home directory on nicdom with the make tools target already built. Then I just ran make -C tools install to install all of the tools.

This is a bit heavy handed since it installs xend and xenstored which we don’t need. Not a big deal IMHO at this point. That’s pretty much it. If you want your vif to be created when your client VM is created just add a vif line to its configuration:

vif=["mac=XX:XX:XX:XX:XX:XX,backend=nic"]

Conclusion

In short the Xen DriverDomain has nearly all the information you need to get a driver domain up and running. What they’re missing are the little configuation tweeks that likely change from time to time and that the xenstore-* tools need to be installed in the driver domain. This last bit really stumped me since there seems to be virtually no debug info that comes out of the networking scripts.

If anyone out there tries to follow this leave me some feedback. There’s a lot of info here and I’m sure I forgot something. I’m interested in any way I can make this better / more clear so let me know what you think.

[1] http://wiki.xen.org/xenwiki/DriverDomain
[2] http://wiki.xensource.com/xenwiki/XenParavirtOps
[3] http://wiki.xen.org/xenwiki/XenSplitDrivers
[4] http://wiki.xen.org/xenwiki/XenPCIpassthrough

Debian Lenny make-kpkg broken with new kernel

If you use make-kpkg to build your kernels and you’re running Lenny you may have had problems building 2.6.34 when it came out. With kernel-package version 11.015 I’m getting the following error:

The UTS Release version in include/linux/version.h
""
does not match current version:
"2.6.34"
Please correct this

I’m sure more recent packages have this bug squashed but on Lenny it’s still a problem. What’s happening is make-kpkg is looking for a version string in $(KERN_ROOT)/include/linux/version.h and it’s not there. Every once in a while the kernel maintainers move stuff around and that’s exactly what happened. The UTS_RELEASE definition was moved from $(KERN_ROOT)/include/linux/utsrelease.h to $(KERN_ROOT)/include/generated/utsrelease.h

I found it confusing that the error message lists the version.h file. Turns out this definition has been moved before and when make-kpkg can’t find it in $(KERN_ROOT)/include/linux/utsrelease.h it falls back to version.h in the same directory. So we fix it with a quick patch.

--- ./version_vars.mk	2008-11-24 12:01:32.000000000 -0500
+++ ./version_vars.mk.new	2010-06-29 21:51:50.000000000 -0400
@@ -138,10 +138,10 @@
 EXTRAV_ARG :=
 endif
 
-UTS_RELEASE_HEADER=$(call doit,if [ -f include/linux/utsrelease.h ]; then  
-	                       echo include/linux/utsrelease.h;            
+UTS_RELEASE_HEADER=$(call doit,if [ -f include/generated/utsrelease.h ]; then  
+	                       echo include/generated/utsrelease.h;            
 	                   else                                            
-                               echo include/linux/version.h ;              
+                               echo include/linux/utsrelease.h ;              
 	                   fi)
 UTS_RELEASE_VERSION=$(call doit,if [ -f $(UTS_RELEASE_HEADER) ]; then                    
                  grep 'define UTS_RELEASE' $(UTS_RELEASE_HEADER) |                       

Down load version_vars.mk.patch. Copy this patch to /usr/share/kernel-package/ruleset/misc/ and apply it:

zcat version_vars.mk.patch.gz | sudo patch -p1

If you’ve already tried to build your kernel and had it fail because of this bug you should copy the version_vars.mk we just patched to $(KERN_ROOT)/debian/ruleset/misc/ and run make-kpkg again. This should keep you from having to rebuild the whole kernel … which takes an age on my laptop.

Can’t wait for Squeeze to go stable but that always comes with a whole set of new problems 🙂