giving up on sound daemons

After whipping up a quick script to kill and restart jackd when my laptop goes into S3 (/etc/pm/sleep.d) I had a revelation: any processes connected to jackd would have to reconnect. This doesn’t happen and sound is just lost. So for jackd to be usable on a laptop they need to support S3 directly and currently in the Debian unstable repositories Jackd doesn’t.

I gave esound a try in place of jack but with totem esd ran the cpu up to 20% for a flash movie which is pretty nuts. It had a noticeable impact on the video playback so esound got uninstalled as quickly as it was installed.

Finally I just fell back to using alsa directly. This actually worked perfectly and I’m not sure why I didn’t just go this route in the first place.

jackd won’t wake up

My last post ended with me describing some troubles getting jackd to wake after my laptop went into S3. I managed to get some debug info out of jack after running it as a dbus service through qjackctl:

ERROR: JackAudioDriver::ProcessAsync: read error, skip cycle
alsa_driver_xrun_recovery
ERROR: ALSA: channel flush for playback failed (File descriptor in bad state)

The CPU utilization at this point goes way up.

A quick search turns up a Debian bug #601008 and bugs filed against a number of other Linux distros. There’s also an open ticket on this issue in Jack’s tracker.

So all the right people know about the issue. The Jack folks even recommend a very sane work around which is manually suspend and resume Jack before and after S3 (hacking suspend / resume manually). I like the idea of jack getting a notification over dbus even more though. That would be very clean. Hopefully we’ll see some action on this soon. Till then I’m looking for a work around.

Getting Started with the Awesome Window Manager

I’ve been a Gnome user for a while now. It’s been good to me but I’m a minimalist and lately it feels like I’ve been spending a lot of time fighting against Gnome. This past week I bought a new SSD for my laptop but instead of just copying my system over to this new disk I decided to rebuild it from scratch and give the Awesome Window Manager a try.

First off Awesome is beyond minimal. Get comfortable on the keyboard if you give this a try. I’m a huge fan, the less I have to touch the mouse the happier I am. The one thing that gave me trouble was getting my sound up and running. Here’s how I got to a fix:

Autostart

Gnome does lots of stuff for you that I didn’t even know was happening. I’m not going to get into the complicated stuff like auto mounting encrypted volumes (that’s for next time). This time It’s the simple stuff: running daemons when you log in.

The Awesome wiki has an Autostart page that was super useful. I grabbed the simple run_once function and put this in rc.lua. This works for starting stuff like xscreensaver:

run_once("xscreensaver","--no-splash")

The problem I ran into with sound was actually related. Squeeze changed the way the jackd server gets started. There used to be an init script that fired it up but no more. Squeeze dropped this behavior and now pushes off the responsibility to user space applications so if you app expects jackd to be running be prepared to start it.

So after installing banshee it freaked out crapping errors all over the console. There were tons of exceptions indicating that lots of the expected Gnome infrastructure was missing but the one that’s actually relevant to why sound wasn’t working is:

Cannot connect to server socket err = No such file or directory
Cannot connect to server socket
jack server is not running or cannot be started

With an error message that says “… or cannot be started” you’d think banshee tried to start jackd but once I started jackd manually the problem went away. So starting jackd when awesome starts up is the answer!

run_once("jackd","-d alsa -d hw:0")

gnome-power-manager and S3 sleep

If you’re on a laptop like me you’ll want gnome-power-manager running. It’s comforting having that battery icon in the tray right? Anyway just throw it in your rc.lua file same as before. Squeeze has great support for my ThinkPad x61s so all of the acpi tool and even the function keys work great for putting my system into S3.

jackd and S3 don’t play nice

The problem I’m dealing with now is that after my laptop wakes up from S3 jackd is hosed. It doesn’t crash outright but I don’t get any sound from Totem or other media players (yeah I prefer Totem over Banshee) so I’ve gotta restart it after I wake up from a suspend.

I’m not sure how to automate this yet. Hopefully it won’t come down to hacking the suspend scripts directly. I’m not even sure where these live … till next time.

managing IPsec packets with iptables

A few weeks back I upgraded my wireless router and added an racoon based IPsec VPN. When I built my first home router / wireless access point a few years ago I did so to get some experience configuring Linux systems and writing firewall rules. I’ve revisited this script a number of times as I’ve added new functionality to the access point and now I’ve added a few rules for IPsec traffic that I figured were worth mentioning here.

First off the script is a “default drop” policy. This means that any packets that don’t match a rule in the firewall ending in a jump to ACCEPT target are dropped. More specifically the default policy for the input, output and forward chains are DROP:

iptables --policy INPUT DROP
iptables --policy OUTPUT DROP
iptables --policy FORWARD DROP

This means that any traffic that doesn’t have an explicit rule that allows it to pass will be dropped. The design goal I had in mind is two write all rules that allow traffic to be as exact as possible so as to not match any unintended traffic. This is as close to mandatory access control for network traffic as I can figure.

Rules for IKE exchange

Hopefully I’ll get a chance to publish the whole script but it’s pretty long (600 lines or so) and needs some cleanup. But the rules I’ve added to allow the racoon server are pretty clean and they’re super useful so here they are:

The first stage of establishing the VPN tunnel is direct negotiation between the client and the access point (the wireless router in this case). The racoon server listens on ports 500 and 4500 and it negotiates directly with the clients racoon server on the same ports. I’m including just the raw rules here but you should wrap these up in functions to keep your scripts clean:

iptables --table filter --append INPUT 
  --in-interface extif                 
  --proto udp                          
  --destination ${PUBLIC_IP}           
  --dport 500                          
  --source  0.0.0.0/                   
  --sport 500                          
  --jump ACCEPT

iptables --table filter --append INPUT 
  --in-interface extif                 
  --proto udp                          
  --destination ${PUBLIC_IP}           
  --dport 4500                         
  --source  0.0.0.0/0                  
  --sport 4500                         
  --jump ACCEPT

iptables --table filter --append OUTPUT 
  --out-interface extif                 
  --proto udp                           
  --destination 0.0.0.0/0               
  --dport 500                           
  --source ${PUBLIC_IP}                 
  --sport 500                           
  --jump ACCEPT

iptables --table filter --append OUTPUT 
  --out-interface extif                 
  --proto udp                           
  --destination 0.0.0.0/0               
  --dport 4500                          
  --source ${PUBLIC_IP}                 
  --sport 4500                          
  --jump ACCEPT
}

Notice that I’ve limited the interface through which the IKE traffic can be accepted and transmitted as well as the IP of the server and the source and destination ports. The loosest part is that of the clients IP which we can’t know in advance. These rules are pretty standard, the interesting ones come next.

Rules for VPN traffic

So now our clients can negotiate security associations with the IKE server. This allows them to get IP addresses on the VPN network but they can’t talk to any other systems on the VPN or the other networks connected to the router (my file server etc).

VPN talking to internal network

To enable this we need FOWARD rules allowing traffic from the VPN network (10.0.2.0/24 that arrives on the interface ‘extif’) to go to the other network connected to the router (10.0.1.0/24 that arrives on the interface ‘wirif’). Typically this would look like:

iptables -A FORWARD         
  --in-interface extif      
  --source 10.0.2.0/24      
  --out-interface wirif     
  --destination 10.0.1.0/24 
  --jump ACCEPT

iptables -A FORWARD         
  --in-interface wirif      
  --source 10.0.1.0/24      
  --out-interface extif     
  --destination 10.0.2.0/24 
  --jump ACCEPT

These rules look pretty tight right? We’ve limited the the networks that can communicate and the interfaces on which the traffic should originate. We can do better though.

I’m particularly wary of the interface on the router that’s connected to the internet. The iptables rules above will allow the traffic we want to allow but it doesn’t require that the traffic from the 10.0.2.0/24 network arrive over the VPN! Theoretically an attacker that isn’t on the VPN could spoof traffic with the VPN IP and exchange packets with the protected 10.0.1.0/24 network. This is bad.

Policy Matching

So we need to add to these rules policy that will require the packets come from, and go to the VPN network through the IPsec tunnel. I’d never even heard of the iptables policy module till I was searching for the answer to this problem. Here’s what I came up with:

iptables --append FORWARD    
  --match policy             
  --dir in                   
  --pol ipsec                
  --mode tunnel              
  --tunnel-dst ${PUBLIC_IP}  
  --tunnel-src 0.0.0.0/0     
  --in-interface extif       
  --source 10.0.2.0/24       
  --out-interface wirif      
  --destination 10.0.1.0/24  
  --jump ACCEPT

iptables --append FORWARD    
  --match policy             
  --dir out                  
  --pol ipsec                
  --mode tunnel              
  --tunnel-dst 0.0.0.0/0     
  --tunnel-src ${PUBLIC_IP}  
  --in-interface wirif       
  --source 10.0.1.0/24       
  --out-interface extif      
  --destination 10.0.2.0/24  
  --jump ACCEPT

This allows the two networks to communicate, but this time it specifies that the packets must travel through the IPsec tunnel as we’d expect.

So now our VPN users can connect to the racoon server on the firewall and can talk to hosts behind the firewall. What’s left? Well my router / firewall also happens to have a dnsmasq server that offers clients DHCP leases and DNS service. The DNS service is particularly useful for those connecting to my storage server so they don’t have to memorize IP addresses.

Offering DNS services to VPN users

VPN users should be able to use the DNS service too (they don’t need DHCP since they get their IPs from racoon). The rules to allow this are very similar to the FORWARD rules above:

iptables --append INPUT     
  --match policy            
  --pol ipsec               
  --dir in                  
  --mode tunnel             
  --tunnel-dst ${PUBLIC_IP} 
  --tunnel-src 0.0.0.0/0    
  --protocol udp            
  --in-interface extif      
  --source 10.0.2.0/24      
  --source-port 1024:65535  
  --destination 10.0.1.1/24 
  --destination-port 53     
  --jump ACCEPT

iptables --append OUTPUT        
  --match policy                
  --pol ipsec                   
  --dir out                     
  --mode tunnel                 
  --tunnel-dst 0.0.0.0/0        
  --tunnel-src ${PUBLIC_IP}     
  --protocol udp                
  --source $10.0.1.1/24         
  --source-port 53              
  --out-interface extif         
  --destination 10.0.2.0/24     
  --destination-port 1024:65535 
  --jump ACCEPT

These rules are pretty self explanatory once you understand the syntax. There’s lots of info in these rules too, we identify the direction of the packets through the ipsec tunnel, the endpoints of the tunnel and the standard IP stuff: source address and port, destination and port as well as the interfaces.

That’s it. 8 rules are all that’s required to allow VPN users to connect, speak to hosts on my internal network and to talk to the DNS server. Using the policy match these rules are pretty exact … if there’s anyway I can make them better let me know in the comments.

madwifi on lenny router

For the past two months my day job has taken over my life. As always, during the height of my insane work schedule my wireless router started acting up. I guess I needed something to break so I had an excuse to take a much needed break from work. I’d pretty much forgotten I even had a wireless access point because it had been so long since it required any attention. I learned a few things that are even worth mentioning here in the process

First off the router I had to fix wasn’t an off-the-shelf 802.11 box. It was actually my first PC Engines router built on the old wrap1e103. It ran Debian Sarge and had an Atheros AR5413 802.11abg which was hot shit when I bought it.

Needless to say there wasn’t any reason to try to fix this system. It’s old, tired and wasn’t experiencing any specific problems except running really slow every once in a while. Best bet was to upgrade.

Almost a year ago I blogged about a VPN gateway that I built on a new PCEngines ALIX platform. I drafted that system to replace this dying Sarge box. Upgrading to Lenny and faster hardware was long over due. It wasn’t all smooth sailing though.

Turns out the madwifi drivers are in flux and the drivers available on Lenny are pretty unstable. Luckily a little Googling turned up someone who already did the research for me and solved this problem! Their fix did the trick and likely saved me a full night of scouring the interwebs. Following these directions the madwifi drivers came up fine in hostap mode and were offering DNS and DHCP through dnsmasq within minutes.

I even had enough time left over to set up the VPN so I can login to my home network while I’m on the road. Since I’ve been traveling every other week all summer this is going to come in handy. The iptables rules for this got pretty interesting so I’ll probably post something about that in the near future.

Debian Logo Rights and Misuse

I’ve been traveling a whole bunch for work lately. A month or two ago I was on the road the hotel I was staying in was hosting a job fair that was put on by a company called the Catalyst Career Group. I couldn’t help but notice their logo looked familiar.

So after standing on my head in the middle of the lobby I realized that it’s just the Debian logo upside down. The Debian project is pretty explicit about the license they distribute their logo under [1] and this doesn’t follow the license by my reading of it.

Now what to do about it? The Debian legal mailing list [2] looks like the right place. I’ll update on the results of this one. Interesting stuff.

[1] http://www.debian.org/logos/
[2] http://www.debian.org/legal/

Xen Network Driver Domain: How

In my last post I went into the reasons why exporting the network hardware from dom0 to an unprivileged driver domain is good for security. This time the “how” is our focus. The documentation out there isn’t perfect and it could use a bit of updating so expect to see a few edits to the relevant Xen wiki page [1] in the near future.

Basic setup

How you configure your Xen system is super important. The remainder of this post assumes you’re running the latest Xen from the unstable mercurial repository (4.0.1) with the latest 2.6.32 paravirt_ops kernel [2] from Jeremy Fitzhardinge’s git tree (2.6.32.16). If you’re running older versions of either Xen or the Linux kernel this may not work so you should consider updating.

For this post I’ll have 3 virtual machines (VMs).

  1. the administrative domain (dom0) which is required to boot the system
  2. an unprivileged domain (domU) that we’ll call “nicdom” which is short for network interface card (NIC) domain. You guessed it, this will become our network driver domain.
  3. another unprivileged domain (domU or client domain) that will get its virtual network interface from nicdom

I don’t really care how you build your virtual machines. Use whatever method you’re comfortable with. Personally I’m a command line junkie so I’ll be debootstrapping mine on LVM partitions as minimal Debian squeeze/sid systems running the latest pvops kernel. Initially the configuration files used to start up these two domUs will be nearly identical:
nicdom:

kernel="/boot/vmlinuz-2.6.32.16-xen-amd64"
ramdisk="/boot/initrd.img-2.6.32.16-xen-amd64"
memory=256
name="nicdom"
disk=["phy:/dev/lvgroup/nicdom_root,xvda,w"]
root="/dev/xvda ro"
extra="console=hvc0 xencons=tty"

client domain

kernel="/boot/vmlinuz-2.6.32.16-xen-amd64"
ramdisk="/boot/initrd.img-2.6.32.16-xen-amd64"
memory=1024
name="client"
disk=[
    "phy:/dev/lvgroup/client_root,xvda,w",
    "phy:/dev/lvgroup/client_swap,xvdb,w",
]
root="/dev/xvda ro"
extra="console=hvc0 xencons=tty"

I’ve given the client a swap partition and more ram because I intend to turn it into a desktop. The nicdom (driver domain) has been kept as small as possible since it’s basically a utility that won’t have many logins. Obviously there’s more to it than just load up these config files but installing VMs is beyond the scope of this document.

PCI pass through

The first step in configuring the nicdom is passing the network card directly through to it. The xen-pciback driver is the first step in this process. It hides the PCI device from dom0 which will later allow us to bind the device to a domU through configuration when we boot it using xm

There’s two ways to configure the xen-pciback driver:

  1. kernel parameters at dom0 boot time
  2. dynamic configuration using sysfs

xen-pciback kernel parameter

The first is the easiest so we’ll start there. You need to pass the kernel some parameters to tell it which PCI device to pass to the xen-pciback driver. Your grub kernel line should look something like this:

module /vmlinuz-2.6.32.16-xen-amd64 /vmlinuz-2.6.32.16-xen-amd64 root=/dev/something ro console=tty0 xen-pciback.hide=(00:19.0) intel_iommu=on

The important part here is the xen-pciback.hide parameter that identifies the PCI device to hide. I’m using a mixed Debian squeeze/sid system so getting used to grub2 is a bit of a task. Automating the configuration through grub is outside the scope of this document so I’ll assume you have a working grub.cfg or a way to build one.

Once you boot up your dom0 you’ll notice that lspci still shows the PCI device. That’s fine because the device is still there, it’s just the kernel is ignoring it. What’s important is that when you issue an ip addr you don’t have a network device for this PCI device. On my system all I see is the loopback (lo) device, no eth0.

dynamic configuration with sysfs

If you don’t want to restart your system you can pass the network device to the xen-pciback driver dynamically. First you need to unload all drivers that access the device: modprobe -r e1000e. This is the e1000e driver in my case.

Next we tell the xen-pciback driver to hide the device by passing it the device address:

echo "0000:00:19.0" | sudo tee /sys/bus/pci/drivers/pciback/new_slot
echo "0000:00:19.0" | sudo tee /sys/bus/pci/drivers/pciback/bind

Some of you may be thinking “what’s a slot” and I’ve got no good answer. If someone reading this knows, leave me something in the comments if you’ve got the time.

passing pci device to driver domain

Now that dom0 isn’t using the PCI device we can pass it off to our nicdom. We do this by including the line:

pci=['00:19.0']

in the configuration file for the nicdom. We can pass more than one device to this domain by placing another address between the square brackets like so:

pci=['00:19.0', '03:00.0']

Also we want to tell Xen that this domain is going to be a network driver domain and we have to configure IOMMU:

netif="yes"
extra="console=hvc0 xencons=tty iommu=soft"

Honestly I’m not sure exactly what these last two configuration lines do. There are a number of mailing list posts giving a number of magic configurations that are required to get PCI passthrough to work right. These ones worked for me so YMMV. If anyone wants to explain please leave a comment.

Now when this domU boots we can lspci and we’ll see these two devices listed. Their address may be the same as in dom0 but this depends on how you’ve configured your kernel. Make sure to read the Xen wiki page for PCIPassthrough [4] as it’s quite complete.

Depending on how you’ve set up your nicdom you may already have some networking configuration in place. I’m partial to debootstrapping my installs on a LVM partition so I end up doing the network configuration by hand. I’ll dedicate a whole post to configuring the networking in the nicdom later. For now just get it working however you know how.

the driver domain

As much as we want to just jump in and make the driver domain work there’s still a few configurations that we need to run through first.

Xen split drivers

Xen split drivers exist in two halves. The backend of the driver is located in the domain that owns the physical device. Each client domain that is serviced by the backend has a frontend driver that exposes a virtual device for the client. This is typically referred to as xen split drivers [3].

The xen networking drivers exist in two halves. For our nicdom to serve its purpose we need to load the xen-netback driver along with the xen-evtchn and the xenfs. We’ve already discussed what the xen-netback driver so let’s talk about what the others are.

The xenfs driver will exposes some xen specific stuff form the kernel to user space through the /proc file system. Exactly what this “stuff” is I’m still figuring out. If you dig into the code for the xen tools (xenstored and the various xenstore-* utilities) you’ll see a number of references to files in proc. From my preliminary reading this is where a lot of the xenstore data is exposed to domUs.

The xen-evtchn is a bit more mysterious to me at the moment. The name makes me think it’s responsible for the events used for communication between backend and frontend drivers but that’s just a guess.

So long story short, we need these modules loaded in nicdom:

modprobe -i xenfs xen-evtchn xen-netback

In the client we need the xenfs, xen-evtchn and the xen-netfront modules loaded.

Xen scripts and udev rules

Just like the Xen wiki says, we need to install the udev rules and the associated networking scripts. If you’re like me you like to know exactly what’s happening though, so you may want to trigger the backend / frontend and see the events coming from udev before you just blindly copy these files over.

udev events

To do this you need both the nicdom and the client VM up and running with no networking configured (see configs above). Once their both up start udevadm monitor --kernel --udev in each VM. Then try to create the network front and backends using xm. This is done from dom0 with a command like:

xm network-attach client mac=XX:XX:XX:XX:XX:XX,backend=nicdom

I’ll let the man page for xm explain the parameters 🙂

In the nicdom you should see the udev events creating the backend vif:

KERNEL[timestamp] online   /devices/vif/-4-0 (xen-backend)
UDEV_LOG=3
ACTION=online
DEVPATH=/devices/vif-4-0
SUBSYSTEM=xen-backend
XENBUS_TYPE=vif
XENBUS_PATH=backend/vif/4/0
XENBUS_BASE_PATH=backend
script=/etc/xen/scripts/vif-nat
vif=vif4.0

There are actually quite a few events but this one is the most important mostly because of the script and vif values. script is how the udev rule configures the network interface in the driver domain and the vif tells us the new interface name.

Really we don’t care what udev events happend in the client since the kernel will just magically create an eth0 device like any other. You can configure it using /etc/network/interfaces or any other method. If you’re interested in which events are triggered in the client I recommend recreating this experiment for yourself.

Without any udev rules and scripts in place the xm network-attach command should fail after a time out period. If you’re into reading network scripts or xend log files you’ll see that xend is waiting for the nicdom to report the status of the network-attach in a xenstore variable:

DEBUG (DevController:144) Waiting for 0.
DEBUG (DevController:628) hotplugStatusCallback /local/domain/1/backend/vif/3/0/hotplug-status

installing rules, scripts and tools

Now that we’ve seen the udev events we want to install the rules for Xen that will wait for the right event and will then trigger the necessary script. From the udevadm output above we’ve seen that dom0 passes the script name through the udev event. This script name is actually configured in the xend-config.xsp file in dom0:

(vif-script vif-whatever)

You can use whatever xen networking script you want (bridge is likely the easiest).

So how to install the udev rules and the scripts? Well you could just copy them over manually (mount the nicdom partition in dom0 and literally cp them into place). This method got me in trouble though and this detail is omitted from the relevant Xen wiki page [1]. What I didn’t know is the info I just supplied above: that dom0 waits for the driver domain to report its status through the xenstore. The networking scripts that get run in nicdom report this status but they require some xenstore-* utilities that aren’t installed in a client domain by default.

Worse yet I couldn’t see any logging out put from the script indicating that it was trying to execute xenstore-write and failing because there wasn’t an executable by that name on it’s path. Once I tracked down this problem (literally two weeks of code reading and bugging people on mailing lists) it was smooth sailing. You can install these utilities by hand to keep your nicdom as minimal as possible. What I did was copy over the whole xen-unstable source tree to my home directory on nicdom with the make tools target already built. Then I just ran make -C tools install to install all of the tools.

This is a bit heavy handed since it installs xend and xenstored which we don’t need. Not a big deal IMHO at this point. That’s pretty much it. If you want your vif to be created when your client VM is created just add a vif line to its configuration:

vif=["mac=XX:XX:XX:XX:XX:XX,backend=nic"]

Conclusion

In short the Xen DriverDomain has nearly all the information you need to get a driver domain up and running. What they’re missing are the little configuation tweeks that likely change from time to time and that the xenstore-* tools need to be installed in the driver domain. This last bit really stumped me since there seems to be virtually no debug info that comes out of the networking scripts.

If anyone out there tries to follow this leave me some feedback. There’s a lot of info here and I’m sure I forgot something. I’m interested in any way I can make this better / more clear so let me know what you think.

[1] http://wiki.xen.org/xenwiki/DriverDomain
[2] http://wiki.xensource.com/xenwiki/XenParavirtOps
[3] http://wiki.xen.org/xenwiki/XenSplitDrivers
[4] http://wiki.xen.org/xenwiki/XenPCIpassthrough

Xen Network Driver Domain: Why

If you’ve been watching the xen-user, xen-devel or the xen channel on freenode you’ve probably seen me asking questions about setting up a driver domain. You also may have noticed that the Xen wiki page dedicated to the topic [1] is a bit light on details. I’ve even had a few people contact me directly through email and my blog to see if I have this working yet which I think is great. I’m glad to know there are other people interested in this besides me.

This post is the first in a series in which I’ll explains how I went about configuring Xen to bind a network device to an unprivileged domain (typically called domU) and how I configured this domU (Linux) to act as a network back end for other Linux domUs. This first post will frame the problem and tell you why you should care. My next post will dig into the details of what needs to be done to set up a network driver domain.

Why

First off why is this important? Every Xen configuration I’ve seen had all devices hosted in the administrative domain (dom0) and everything worked fine. What do we gain by removing the device from dom0 and having it hosted in a domU.

The answer to these questions is all about security. If all you care about is functionality then don’t bother configuring a driver domain. You get no new “features” and no performance improvement (that I know of). What you do get is a dom0, the most security critical domain, with a reduced attack surface.

Let’s consider an attack scenario to make this concrete: Say an exploit exists in whichever Linux network driver you use. This exploit allows a remote attacker to send a specially crafted packet to your NIC and execute arbitrary code in your kernel. This is a worst case scenario, probably not something that will happen but it is possible. If dom0 is hosting this and all other devices and their drivers your system is hosed. The attacker can manipulate all of your domUs, the data in your domUs, everything.

Now suppose you’ve bound your NIC to a domU and configure this domU to act as a network back end for other domUs. Using the same (slightly far-fetched) vulnerability as an example, have we reduced the impact of the exploit?

The driver domain makes a significant difference here. Instead of having malicious code executing in dom0, it’s in an unprivileged domain. This isn’t to say that the exploit has no effect on the system. What we have achieved though is reducing the effects of the exploit. Instead of a full system compromise we’re now faced with a single unprivileged domain compromise and a denial of service on the networking offered to the other VMs.

The attacker can take down the networking for all of your VMs but they can’t get at their disks, they can’t shut them down, and they can’t create new VMs. Sure the attacker could sit in the driver domain and snoop on you domUs traffic but this sort of snooping is possible remotely. The appropriate use of encryption solves the disclosure problem. In the worst case scenario attacker could use this exploit as a first step in a more complex attack on the other VMs by manipulating the network driver front ends and possibly triggering another exploit.

In short a driver domain can reduce the attack surface of your Xen system. It’s not a cure-all but it’s good for your overall system security. I’m pretty burned out so I’ll wrap up with the matching “How” part of this post in the next day or two. Stay tuned.

[1] http://wiki.xen.org/xenwiki/DriverDomain

Debian Lenny make-kpkg broken with new kernel

If you use make-kpkg to build your kernels and you’re running Lenny you may have had problems building 2.6.34 when it came out. With kernel-package version 11.015 I’m getting the following error:

The UTS Release version in include/linux/version.h
""
does not match current version:
"2.6.34"
Please correct this

I’m sure more recent packages have this bug squashed but on Lenny it’s still a problem. What’s happening is make-kpkg is looking for a version string in $(KERN_ROOT)/include/linux/version.h and it’s not there. Every once in a while the kernel maintainers move stuff around and that’s exactly what happened. The UTS_RELEASE definition was moved from $(KERN_ROOT)/include/linux/utsrelease.h to $(KERN_ROOT)/include/generated/utsrelease.h

I found it confusing that the error message lists the version.h file. Turns out this definition has been moved before and when make-kpkg can’t find it in $(KERN_ROOT)/include/linux/utsrelease.h it falls back to version.h in the same directory. So we fix it with a quick patch.

--- ./version_vars.mk	2008-11-24 12:01:32.000000000 -0500
+++ ./version_vars.mk.new	2010-06-29 21:51:50.000000000 -0400
@@ -138,10 +138,10 @@
 EXTRAV_ARG :=
 endif
 
-UTS_RELEASE_HEADER=$(call doit,if [ -f include/linux/utsrelease.h ]; then  
-	                       echo include/linux/utsrelease.h;            
+UTS_RELEASE_HEADER=$(call doit,if [ -f include/generated/utsrelease.h ]; then  
+	                       echo include/generated/utsrelease.h;            
 	                   else                                            
-                               echo include/linux/version.h ;              
+                               echo include/linux/utsrelease.h ;              
 	                   fi)
 UTS_RELEASE_VERSION=$(call doit,if [ -f $(UTS_RELEASE_HEADER) ]; then                    
                  grep 'define UTS_RELEASE' $(UTS_RELEASE_HEADER) |                       

Down load version_vars.mk.patch. Copy this patch to /usr/share/kernel-package/ruleset/misc/ and apply it:

zcat version_vars.mk.patch.gz | sudo patch -p1

If you’ve already tried to build your kernel and had it fail because of this bug you should copy the version_vars.mk we just patched to $(KERN_ROOT)/debian/ruleset/misc/ and run make-kpkg again. This should keep you from having to rebuild the whole kernel … which takes an age on my laptop.

Can’t wait for Squeeze to go stable but that always comes with a whole set of new problems 🙂

xend fails silently

I’ve been giving myself a crash course in Xen recently. I’ve played with Xen in the past but it’s always annoyed me that the kernel for dom0 was super old (2.6.18). Debian has always been good with their support and the 2.6.26 kernel that ships with Lenny has worked well.

Now that Xen 4.0 is out and I’m getting ready to transition over to Squeeze, I wanted to go back and find where the upstream dom0 kernel was maintained. Turns out there’s a git tree hosted on kernel.org with a stable 2.6.32 and more bleeding edge kernels as well. Building both Xen 4 and the 2.6.32.10 kernel from git were pretty uneventful. I did however run into a problem getting xend to start.

There’s nothing worse than a program that fails without giving any good indication as to why. xend failed with this error message in /var/log/xen/xend-debug.log:

Error: (111, 'Connection refused')

That’s it. Not helpful at all. I started fishing through an strace but got lost in the output pretty quick. Luckily there was some help to be had in the #xen channel on freenode.

The folks on IRC told me it’s quite common for xend to fail quietly like this if there’s a needed xen kernel module that isn’t loaded. The usual suspects are xen-evtchn or xen-gntdev but it may be some other component that wasn’t compiled in. In my situation I had xen-gntdev built in but xen-evtchn was built as a module and hadn’t been loaded. A quick modprobe later and xend was up and running. I went one step further and recompiled this piece into the kernel directly.

Now I’m happily running Xen 4.0 with a 2.6.32 kernel on Debian Squeeze. Good times.