Running OP for the First Time

OP is a little wonky to run the first time. Not because it doesn’t work but because it doesn’t appear to. Sounds strange right? Well the first time I fired it up (by running the run-op2 script in the root of the source tree) it loaded up the web page of one of the students working on the project. His name is Shou Tang [1].

Unfortunately OP loads most of his page, sometimes the whole thing, and then seems to get lost. Attempts to load any other page will fail. The little icon in the tab which indicates that it’s “thinking” continues to spin but nothing happens. If you fire up tcpdump and watch for http traffic you’ll see that OP is making web requests and getting responses but it doesn’t render the page at all.

Unfortunately there isn’t a whole lot of debug output. Since I’m not very familiar with the architecture yet there may be some debug that I just don’t know where to find. There is a warning that gets dumpted to the console indicating that a clientClosed signal is being ignored this warning is generated by a lot of pages, not just Shou’s page. I did get a few weird errors from X but I couldn’t seem to reproduce these reliably.

So to get OP working you need to fire it up using the run-op2 script and let it get all messed up by Shou’s page. Then pull down the Menu button that’s in the upper right side of the screen. Go into the Edit tab and select Preferences. In here change the default home page to something that won’t muck up the browser (I chose Google). Then restart the browser. This time you’ll start out on a different home page and hopefully this one won’t screw up OP.

I’m gonna hold off on speculating as to why Shou’s page messes up OP. Fixing this bug would be nice but it’s not really my focus. My next post will detail how OP starts up and hopefully some architectural details (which parts talk to each ether etc). Documenting this will be the first step towards laying out what a Type Enforcement policy (a la SELinux) for OP should look like.

[1] http://www.cs.uiuc.edu/homes/stang6/

OP browser install

Building / installing “research” software is always fun. OP was better than most as far as the building goes. There isn’t a way to install it (at least not through the build system) so we’ll leave that part out. For posterity the code I’m using is a tarball they put up on google code back in October [1]. I had hoped to use the svn tree that they advertise [2] but it’s just an empty directory, no code.

Their directions are pretty good. I started from their google code wiki page that has directions for doing the install on Ubuntu. I did the install on a Debian Squeeze system that I’m running as a VM on my laptop. OP uses the WebKit rendering engine and those of you familiar with building WebKit already know how long it’s gona take me to build this in a wimpy little VM 🙂

Packages

The first time running OPs build script it will fail. There’s a bunch of development software you’ll need that’s not part of a default Squeeze install. Save yourself some time and just apt-get install these packages:

apt-get install gcc g++ flex bison gperf qt4-qmake libqt4-dev libsqlite3-dev libphonon-dev libxext-dev x11proto-xext-dev libfontconfig1-dev

Some of these libraries like libphonon-dev and libxext-dev were discovered as dependencies through trial and error. I mean to say I ran the build script, it errored out with some cryptic error like a missing header file like SomethingPhonon.h and then I apt-cache searched for a development package with the keywords phonon and dev and found the right one. Trial and error is pretty time consuming when you’re compiling this on a very low powered machine. Some additional packages may be pulled in as dependencies but the above list should be enough to get you what you need. If you run into any problems recreating this let me know in the comments.

OP recommends that you install Qt directly from Nokia but everything built fine for me using the Qt4 shipped with Squeeze. There are webkit and libqt4-webkit packages on squeeze and I tried these first. I’m pretty sure the libqt4-webkit package is missing some headers that OP needs since the build failed looking for headers that are supplied in the Qt bindings from the WebKit source tree. Nothing’s perfect, just use the WebKit source and the Qt4 bindings that comes with it.

Building WebKit

This is the part where I start complaining about building WebKit. Not just how long it takes (that’s my laptops fault) but the crazy build system. I guess I’ve been spoiled by all the great open source packages out there that build with the standard ./configure && make && sudo make install. Webkit ships with an autobuild.sh which will bootstrap the standard gnu autotools infrastructure but it will only build the WebKit core, it won’t build the Qt bindings we need for OP.

OP goes above and beyond in that they ship a script that downloads the WebKit code from the “nightly build” that OP was developed against [3]). It applies a set of patches too.

You can build WebKit through the script supplied by OP or you can do it yourself. If you chose the latter all you need to do is:

wget http://builds.nightly.webkit.org/files/trunk/src/WebKit-r48592.tar.bz2
tar jxvf WebKit-r48592.tar.bz2
mv WebKit-r48592 web-app/WebKit
cd WebKit; cat ../webkit_patches/*r48592.diff | patch -p0; cd ../
./web-app/WebKit/WebKitTools/Scripts/build.sh --qt --release

That’s downloading the right nightly build, extracting it, renaming the directory (OP has this path hard coded in their scripts), patching it and running the build script. I did this manually because the build kept failing and I wanted to narrow down the problem. Even with all of the right libraries installed I was getting a strange error from g++ indicating that there was a bug in the compiler itself:

g++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <file:///usr/share/doc/gcc-4.4/README.Bugs> for instructions.
make[1]: *** [obj/release/FrameLoader.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory `/home/myuname/opbrowser-release-2009_09_30/webapp/WebKit/WebKitBuild/Release/WebCore'
make: *** [sub-WebCore-make_default-ordered] Error 2

Google for this error real quick and usually the problem is the machine that’s doing the compile running out of RAM. There are some great mailing list posts with people compiling glibc on a system with 32MB of ram with 128MB swap space. That makes my VM look like a super computer (512MB of RAM and a 1GB swap disk). My first reaction was then to think that there was no way I was running out of RAM.

More swap!!

So how to test this? Run the build again and this time run in parallel the command free -m -s 2. This will poll your RAM and swap usage every 2 seconds printing some info to the console. Sure enough the build was using up all of my RAM and swap which is pretty ridiculous IMHO.

So just throw more ram at it right? Getting KVM to give this VM more RAM take a restart so we get on that giving it 1GB of RAM (double what it had previously) and leave the 1GB of swap alone. FAIL, free still shows us running out of both RAM and swap.

OK no messing around this time. On my host system I allocated a 5Gb KVM logical volume and passed this to the VM as an additional hard disk (another restart of the VM). I then dropped the old swap space and set this 5Gb disk as swap. This turned out to be enough. Watching free showed my swap usage going well over 2Gb … jeez that greedy. Something in this script is assuming that my system is pretty beefy.

One final problem I ran into was a significant number of undefined references turning up in the final linking. This is from the build failing so many times previously. Typically you’d hope the build system would rebuild anything that fails but that’s not the case here. In fact even if you run the build script with the --clean switch it doesn’t clean the enough to remove broken object files. I had to manually delete the WebKitBuild directory which is under the WebKit root and rebuild WebKit one last time. You’ll see this message when you’re done:

===========================================================
 WebKit is now built (2h:45m:25s). 
 To run QtLauncher with this newly-built code, use the
 "./run-launcher" script.
===========================================================

That’s right, almost 3 hours to build this beast.

Building OP

In comparison to WebKit, building OP was a breeze. The only additional libraries were a few from boost [4]. These are as follows.

apt-get install ant sun-java6-jdk libboost-dev libboost-regex-dev

Installing libboost-regex-dev will pull in a bunch of boost dev packages one of which is libboost-dev. I’ve included libboost-dev in the list above just for completeness.

I’m pretty sure the OpenJDK java packages would work but since we’re trying to minimize the possible problems we may run into I just grabbed the “non-free” sun packages. That way if I end up having to get in touch with the guys that wrote OP with a question / problem they won’t have the opportunity to say “we don’t support the OpenJDK packages, make sure you’re using the genuine Sun (Oracle?) Java”. If anyone gives this a go with the OpenJDK packages let me know how it turns out in the comments.

Once you’re done accepting the licensing agreement ::sigh:: run the build script and OP should be good to go. OP ships with a build script in its root named build.sh. Agian this makes some assumptions about your system since it passes make the -j4 flag. This is generally the option you’d pass to make if you’ve got 2 CPUs. Since my VM only has one I went through and removed it:

cat build.sh | sed 's/-j4//' > build-single.sh

Then run it and you should be good to go.

I couldn’t figure out how to actually install WebKit once it was built. OP takes this on by setting LD_LIBRARY_PATH and DYLD_LIBRARY_PATH environment variable in its launch script to contain the WebKit release library directory: web-app/WebKit/WebKitBuild/Release/lib. There is also a hard coded reference to that path in some of OPs Makefiles.

This can cause a problem if you don’t pass the WebKit build script the --release flag (like maybe you built it with --debug instead). OP won’t build right in this case. It will fail complaining about a bunch of undefined references. If you do this by mistake and you don’t want to rebuild WebKit (because it takes around 3 hours) you can just use a soft link.

So now it’s built. This post is long enough so I’ll comment on running OP next time.

[1] http://code.google.com/p/op-web-browser/downloads/list
[2] http://code.google.com/p/op-web-browser/source/checkout
[3] http://builds.nightly.webkit.org/files/trunk/src/WebKit-r48592.tar.bz2
[4] http://www.boost.org/

MAC SEED Lab Requirements

It’s been way to long since my last post on this subject. I’ve been rolling around ideas in my head for the elements that will make this Lab good, or bad for that matter. I’m just going to dump them here and refine them as I either as I run across new requirements or I realize that something on this list is a bad idea.

  • The lab task should focus on policy development and what it means to the system as a whole. Integrity and secrecy should both be addressed as part of the lab.
  • Set up of the lab should take no effort on the part of the student. SELinux should already be installed with a known good policy on their systems. The only thing they should be concerned with is writing policy, maybe some code to confine, building the policy, inserting it into the kernel and debugging the output.
  • This lab is intended to reinforce the mandatory access control concept. It’s not an SELinux lab per se. SELinux is just the MAC system used to reinforce the MAC concept. This implies that the Lab shouldn’t be bogged down in the details of managing an SELinux system. Since the labs are intended to be run from a Ubuntu VM we have to be sure SELinux is well supported or already set up on this VM.
  • Following the previous point it’s important that the MAC concepts from the class lecture be incorporated into the lab explicitly. It’s been a while since I took the class that this lab will be taught in so getting a copy of the lecture notes and ensuring I’m reinforcing the right concepts is important. I may need to make suggestions for new topics to be discussed in class but I’ll try to keep changes to the lecture minimal.
  • It would be nice if we could make practical links to a previous lab showing how MAC can defend against specific attacks. There is a lab used in this class showing buffer overflows at work. Showing the code previously developed for the buffer overflow lab thwarted by SELinux would be cool. After walking through an example this would make a good independent task for the students to undertake.
  • Simplicity. Keep the policy developed from getting too scary is a must.
  • The Reference policy by far the policy “language” to use when developing “real” policy but exposure to the raw policy is a must. It may make sense to have students determine the raw policy needed to perform a task and then have them hunt through the reference policy interfaces searching for the right interface to use. That could get ugly though. That may not even be practical since reading the reference policy requires a certain amount of skill. This ones gona take some thought.

I’m going to let these sit for a day or two and do some thinking. Refinements will follow as will a task list derived from the “final” requirements. I know, requirements are never final but I’ll pretend they are when I move on to the tasks … at least until they change. I’m interested in any comments or suggestions that the interwebs may have so let me know what you think.

dnsmasq and racoon VPN

I’ve always used the standard dhcp3 server on my local networks (non-routable IPs). Never really knew of any other options and I didn’t look for any. As the few networks I manage have gotten larger I’ve wanted my DHCP server to be able to feed information into local DNS so I don’t have to maintain hosts files or remember IP addresses. I’ve heard horror stories about configuring BIND so I figured hooking up DHCP and BIND would be way too much work for my purposes.

After some digging I ran across dnsmasq [1]. It’s a DHCP server and a DNS proxy rolled into one. As it doles out DHCP leases it populates the DNS proxy with host names, just what I need. There are a good lot of howto’s out there for setting up dnsmasq so I won’t pollute the web with another one that’s likely not as good as the others. Frankly, dnsmasq can pretty much be configured with the information contained in its well documented example config file (provided as part of the Debian package).

What I will add to the inter-tubes is how I got dnsmasq to resolve names for VPN users connected to the racoon VPN that I’ve documented in a previous post [2] without interfering with other DNS configurations on the client. This requires a few modifications to the racoon server config and the client side up/down scripts. It also takes some resolvconf magic to finish the job.

Serving DNS info to VPN Clients

The configuration required to get racoon to send DNS information to clients as part of the mode_cfg is pretty straight forward.

    dns4 10.XXX.XXX.XXX;
    default_domain "vpn.example";

That’s it. The client side up script receives these config parameters in two new environment variables: INTERNAL_DNS4_LIST and DEFAULT_DOMAIN. The INTERNAL_DNS4_LIST reflects the fact that we can include the address of more than one DNS server. In this example we’ve only got one but we write our script such that it can handle the list.

In the up script we’ve got the DNS information now but what do we do with it? I’m no expert at building a resolv.conf files by hand and I really don’t want to be. We need a way to manage multiple DNS configurations at the same time such that when we need to resolve names for hosts on the VPN network they get routed to the DNS server configuration received from racoon. Other names we want resolved by whatever DNS configuration was in place when we brought up the VPN connection. The resolvconf program (yeah bad/confusing choice of names) almost does what we need.

Client Configuration with resolvconf

The man page for resolvconf is pretty straight forward but it leaves out one specific detail. By my reading of the man page I would think to call resolvconf as follows:

  echo -e "domain ${DEFAULT_DOMAIN}nnameserver ${DNS_IP}" | resolvconf -a ${INTERFACE}

Where INTERFACE would be the name of the interface we’re talking to the VPN through.

This doesn’t actually work though. After an hour of trying multiple configurations to see what I was doing wrong I thought to look at the script that the resolvconf package installed in my /etc/network/if-up.d directory. This script takes whatever DNS info was associated with the interface either statically in the interfaces file or dynamically over DHCP and feeds it into the resolvconf program. It does something funny though. The interface name used isn’t actually that of the interface. The script appends the address family to the interface name passed into resolvconf.

I tried using this convention for the VPN configuration scripts. I appended ‘.vpn’ to the interface name (very original I know) and this time the DNS info obtained over the VPN doesn’t stomp all over the existing DNS info (the configuration my laptop got from DHCP on the local network). The small addition to the racoon up script is as follows:

RESOLVCONF=$(which resolvconf)
INTERFACE=$(ip route get ${REMOTE_ADDR} 
    | grep --only-match 'dev[[:space:]][0-9a-zA-Z]*' | awk '{print $2}')
if [ -x ${RESOLVCONF} ]; then
    INPUT=""
    for DNS in ${INTERNAL_DNS4_LIST}
    do
        INPUT=${INPUT}$(echo "nameserver ${DNS}")
    done
    echo -n -e "domain ${DEFAULT_DOMAIN}n${INPUT}" 
        | resolvconf -a "${INTERFACE}.vpn" | logger -t "phaseone-up"
fi

This is a step in the right direction but it still doesn’t work exactly as we want.

The resolv.conf file generated by resolvconf after bringing up the VPN looks like this:

nameserver 192.XXX.XXX.XXX
nameserver 10.XXX.XXX.XXX
search home.example vpn.example

Here the 192.XXX.XXX.XXX DNS server was obtained by our network interface when it was brought up using DHCP. This is the DNS server on my home network. It knows the names of devices that have registered using DHCP and when searching for a hostname that’s not qualified the suffix appended is ‘home.example’. I leave off the top level suffix to prevent the proxy from forwarding bad search requests. The 10.XXX.XXX.XXX DNS server is the one that will resolve hosts on the VPN network. Again it knows the names of devices that have registered on the VPN network using DHCP and provides the search suffix of ‘vpn.example’.

Why This Doesn’t Work

Because the home DNS server is listed before the VPN DNS server it will be queried first. When asked for a host that exists on the VPN domain the query will first be sent to the DNS server on the home.example netowrk and the query will fail. The query will fall through to the next nameserver only in the case of a timeout or an error so the VPN DNS server will not be queried in this case and we can’t resolve names on the VPN network. If we switch their order manually we’ll be able to resolve names on the vpn.example network but attempts to resolve names on the home.example network will fail.

This situation is represented graphically here:

To make this more concrete, say we want to resolve the name for ‘bob’ (like if I were to run ‘ping bob’), a system on the home.example network. We’d expect the resolver to be smart enough to search through the two available DNS servers knowing their search domains. It could ask the vpn.example DNS server for ‘bob.vpn.example’ and if I didn’t find bob there it could then ask the DNS server on home.example for ‘bob.home.example’. If only the resolver functions in libc were this smart.
NOTE: we’d be in trouble if each network has a host named ‘bob’ but how to handle that situation is out of scope for this discussion.

For configurations that are relatively advanced we have to fall back on a DNS proxy like dnsmasq. Yes we’re already running dnsmasq as a DNS proxy on these two networks but the problem we’re running into is that the resolver on the client isn’t smart enough. The smarts we need are built into dnsmasq.

dnsmasq as a Client-side DNS Proxy

Installing dnsmasq on the client is painless. It’s already tied into the resolvconf system so its notified of changes to the reslover information but it preserves the behavior of the standard libc resolver described above. We can however statically configure dnsmasq to consult a particular DNS servers for a specific domain with one configuration line:

server=/domain/ip

For the network layout described we could add two lines to the dnsmasq.conf file to get the behavior we want:

server=/home.example/192.XXX.XXX.XXX
server=/vpn.example/10.XXX.XXX.XXX

Static configurations stink though (too easy) and with a little more work we can get the same effect with a short script:

#!/bin/sh                                                                       
# ip address from string to int                              
function inet_aton () {
    local count=3
    local int=0
    for num in $(echo $1 | sed -e 's/./ /g'); do
        let "int+=$num*256**$count"
        let "count-=1"
    done
    echo $int
}
pushd "/etc/resolvconf/run/interface/" > /dev/null
FILES=$(/lib/resolvconf/list-records | sed -e '/^lo.dnsmasq$/d')
for file in $FILES; do
    ns=$(cat $file | sed -n -e 's/^[[:space:]]*nameserver[[:space:]]+//p')
    PARAMS+="uint32:$(inet_aton $ns) "
    domain=$(cat $file | sed -n -e 's/^[[:space:]]*domain[[:space:]]+//p')
    PARAMS+="string:$domain "
done
dbus-send --system  --dest='uk.org.thekelleys.dnsmasq' 
    /uk/org/thekelleys/dnsmasq uk.org.thekelleys.SetServers $PARAMS
popd > /dev/null

For this script to make sense it’s important to know that when the resolvconf system is passed DNS information for a particular interface it makes a file with the name of the interface in /etc/resolvconf/run/interface/. This last script is placed in the directory /etc/resolvconf/update.d/. Each script in this directory is run every time the resolvconf information is changed. In the script we extract the nameserver and domain information from each file and send it to dnsmasq through the dnsmasq dbus interface (which must be enabled in dnsmasq.conf).

That’s it. Now each time we make a connection to the VPN the racoon client scripts send the VPN DNS info into resolvconf. resolvconf then runs its update.d scripts and the new script that we’ve provided takes this DNS information and sends it through to dnsmasq through a dbus interface. That was a lot of work, but now my VPN works the way I want it to. Well worth the effort IMHO.

I’m no dbus expert but I don’t really like the dnsmasq dbus interface. All functionality for manipulating the servers is packed into one function. As you can see from the above script it’s just “SetServers”. The interface would be much more effective and much easier to use if this one function were broken up into several, like an “AddServer”, “RemoveServer” etc. The full documentation for the dnsmasq dbus interface can be found here [3]. Proposing a few patches to fix this up would be a fun summer project 🙂

Racoon IPsec VPN on Debian Lenny

I’ve been wanting to set up a “pure” IPsec VPN using racoon for a while now. Part just for fun, part because I can. I spent a weekend on it once a while back, didn’t make much progress, got sick of trying to figure out cryptic racoon debug output and then gave up (more pressing stuff, you know how it is).

Anyways I ran into a situation where I NEED a VPN now. I manage a few systems that are facing the internet so they’re constantly under attack (mostly bots trying to brute force ssh all the time). Remote administration over ssh is pretty much all I need but I’d like to be able to keep a closer eye on the hardware through the “Integrated Lights-Out” system (they’re HP Proliant servers). These I don’t want facing the internet. Similarly I don’t want the configuration interfaces for my switch facing the public either.

So what I needed was a management network that I could connect to through a VPN gateway remotely (typically known as a “roadwarrior” setup). The ALIX system I’ve been blogging about in the recent past is what I’m using as the gateway / server. This post is a quick run down of the contortions I went through to get this working and why I didn’t get it working just how I want it 😦

Requirements

  • racoon only, no L2TP
  • rsasig authentication
  • as little hard coded network configuration as possible on the client

I thought the above was pretty ambitious. Configuring racoon is pretty complicated but after pouring over the man page for racoon.conf I found the mode_cfg section which specifies the network configuration for the server to send out to authenticated clients. A little more digging turned up a few examples, particurlarly useful were the netbsd howto and the howto forge racoon roadwarrior configuration.

Both of these give a working example of using the hybrid_rsa authentication with mode_cfg. This isn’t exactly what I wanted but it solves 2 of my 3 requirements above so it’s a great start. Next was adding my own server and client certificates to the configuration and making sure that both the certificate and the remote identifier were being verified. I didn’t want to keep having to type in a password when connecting to the VPN so I moved on to getting rsasig authentication working. Naturally at this point all hell broke loose.

It took me forever to figure it out, but it looks like the ipsec-tools version that ships with Lenny (0.7.1) doesn’t play nice with rsasig authentication and mode_cfg. The client and server are able to negotiate phase 1 without any troubles when the client never requests the configuration data from the server. I tried all sorts of configuration combinations hoping to find something that worked with no luck. Eventually I ran across an old and unanswered post to the ipsec-tools users mailing list from a few years back describing the same problem I’m having with the 0.7.0 version of racoon. Probably safe to assume that this behavior is what I’m running into on the version 0.7.1.

At this point my options were to upgrade to a later version and hope the bug was fixed or use hybrid auth. Guess which one I chose … hybrid auth ain’t so bad 🙂 Yeah typing in a password is a PITA but both client and server can still be configured to check each others certs and asn1dns identifiers in phase 1 so very little (if any) security is compromised. 2 out of 3 requirements isn’t bad. None of the desired functionality was lost but I do have to supply a password each time I connect to the VPN. Meh.

Configurations

Since the articles on howto forge and netbsd.org are so good I won’t bore you with a full description of my racoon configuration since it’s very similar. I will include them here for completeness and cover the parts where they differ.

The server.racoon.conf is very similar except for the credential verification for the client and some details in the mode_cfg. I’m using the split_network directive to send routing information to the client so we don’t have to hardcode any routes. The scripts on the client side had to be changed to accommodate this but it wasn’t that hard (I’ll get to this in a second). Also notice that I’m using all class C networks (/24 in CIDR) so no routes need to be specified on systems plugged into the management network directly.

In my client.racoon.conf the only difference is in the verification of the servers credentials (certs). I started out testing using certificates generated from my own root CA. When this was deployed I got certificates from cacert.org which is a great service.

The significant changes on the client side were in the phase one up and down networking scripts. I really didn’t like the scripts from the howto forge article (some of it didn’t seem necessary). They were a great starting place though but since I’m using the split_network I had to be able to properly handle dynamically creating and removing these routes. The final scripts can be found here: client.phaseone-up.sh and client.phaseone-down.sh

If you look closely you’ll notice that in the phaseone-down script I flush all SAs when taking down the VPN. Obviously this isn’t what we want: we want only to delete the relevant SAs. Unfortunately version 0.7.1 of the ipsec-tools package doesn’t play well with the way Linux manages SAs so the deleteall setkey function doesn’t work right. Flush is all we’re left with unless we’re gona parse the SADB output for the SPIs we need. This bug was reported on the ipsec-tools devel mailing list with a patch and it seemed well received so it’s likely in a later release. I’ll put off writing fancy bash script and just upgrade racoon soon.

Hope this is useful to everyone out there setting up a roadwarrior friendly racoon VPN. Leave me a comment if this was useful or if any of it’s unclear.

Amazon mp3 Downloader Woes

First if you’re new to the Amazon MP3 downloader and you’re running Debian check out the howto on the Ubuntu forums. It’s very helpful.

Now on to my complaints (that’s what blogs are for right?)

I think it’s pretty cool that Amazon released their MP3 downloader program for Linux. That’s about the only nice thing I have to say about it. Well not really. They package it for Debian and a whole bunch of other distros so they get credit for that too.

My first gripe is that they only distribute a 32 bit x86 version. Compiling and packaging this for x86_64 (amd64) would take such a small amount of effort on their part it isn’t funny. On the client side installing a 32 bit application is a real pain. You’ve gotta stumble through downloading the 32 bit libraries by hand. Utilities like getlibs make it much less painful but it’s still not nearly as good as the dependency tracking that are built into .deb files for a reason.

But the reason I’m writing this is that, after about 3 months using this application without any problems it broke this week. After buying 2 albums it started giving me an error: “Can’t connect. Please check your internet connection…” That’s just insulting guys. I’ve got Pandora streaming, 2 ssh sessions, an imap and a VPN connection open and you’re gona tell me to check my internet connection. Right.

So after mucking around for a while it turns out this is due to a dependency on the lib32nss-mdns package. Their deb package didn’t have lib32nss-mdns as a dependency so how was I supposed to know?. Even stranger is how suddenly after 2 months of using the downloader (and probably $200 spent on music) it suddenly stopped working. Why’d it work before? My guess is they updated (aka broke) something on the server side and since they don’t expose their application through an apt repository there was no way to notify users except by breaking the client application.

After finally figuring out what’s wrong I just went ahead and downloaded the new version of the Amazon MP3 client … just to find out that a few failed attempts to download your purchase will cause Amazon to lock you out. That’s right, I can’t download my MP3s because they broke their client. Now I’ve gotta go through customer service and ask Amazon to unlock my music. What a joke.

But there is hope: there’s a command line client for the Amazon mp3 store that’s opensource. I haven’t tried it yet but if this thing breaks again I’ll make the switch.

Using tmpfs to Minimize Disk IO

Now that I’ve got my ALIX system up and running Lenny, it’s time to tweak the configuration. One of the things I liked best about the Voyage distribution is its use of tmpfs for the directories that receive a lot of writes to minimize the IO on the compact flash (CF) card. The reason for doing this is there’s a maximum number of write cycles that can be made to the CF card. Not that I’ve actually worn out a CF card before but I don’t intend to either.

I want to have /tmp, /var/run, /var/lock and /var/log mounted as tmpfs. There’s a few resources out there that provide scripts and methods for doing this but I’m not a big fan of any of them (see references section below). Debian has almost all of the necessary machinery to perform this task with minimal custom scripting. We’ll be be mucking around in the /etc/init.d and /etc/rcS directories but as little as possible.

/var/run and /var/lock

A significant portion of what we want can be achieved using the features of the mountkernfs.sh script. There are two variables called RAMRUN and RAMLOCK that control whether or not /var/run and /var/lock are mounted as tmpfs respectively. These variables are set in /etc/default/rcS and the mount points are created in the /etc/init.d/mountkernfs.sh script if the associated variable is set to “yes”.

There does seem to be a small bug in this script however. It does not import the variables it needs from /etc/defaults/rcS. I’m pretty sure this is a bug and can be fixed with a very small patch

--- ./mountkernfs.sh.old	2010-01-02 22:32:44.000000000 -0500
+++ ./mountkernfs.sh	2010-01-02 22:33:09.000000000 -0500
@@ -18,6 +18,7 @@
 . /lib/init/mount-functions.sh
 
 [ -f /etc/default/tmpfs ] && . /etc/default/tmpfs
+[ -f /etc/default/rcS ] && . /etc/default/rcS
 
 do_start () {
 	#

/tmp and /var/log

After this we’re half way to achieving our goal. It would be nice if the /var/log directory could be mounted as easily but most people will tell you that having log files reside on non-persistent storage is a very bad idea. If something goes wrong and your system goes down you won’t be able to analyze your log files. This is a very real concern which we will address shortly. First the remaining two mount points need to be mounted through /etc/fstab with the following two entries:

tmpfs  /tmp     tmpfs   defaults,noexec,nosuid,mode=1777         0   0
tmpfs  /var/log tmpfs   defaults,noexec,nosuid,nodev,mode=755  0   0

This solves the issue of mounting /tmp but /var/log requires a little more work. Debian (and LInux in general I think) expects that some files and directories will exist in the logging directory. To account for this, after the mount scripts run we want to create the necessary file structure. I’ve done this by creating a tar archive of the expected structure and extract it to the newly mounted tmpfs /var/log directory on each system boot. The following script: logskel.sh.gz does exactly this:

PATH=/sbin:/bin
. /lib/init/vars.sh
. /lib/lsb/init-functions

# get configuration info for this script
[ -e /etc/default/log-skel ] && . /etc/default/log-skel

case "$1" in
	start|"")
		log_begin_msg $@
                # select defaults if the configured options don't make sense
		[[ ! -f $SKEL ]] && SKEL=/lib/init/log-skel.tar.gz
		[[ ! -d $LOG_DIR ]] && LOG_DIR=/var/log
		/bin/tar -zxf ${SKEL} -C ${LOG_DIR} 2>&1 > /dev/null
		log_end_msg $?
		;;
	restart|reload|force-reload)
		echo "Error: argument '$1' not supported" >&2
		exit 3
		;;
	stop)
		# No-op
		;;
	*)
		echo "Usage: $NAME [start|stop]" >&2
		exit 3
		;;
esac

You’ll need to put the archive that’s being extracted into /lib/init or specify a different location through the /etc/default/log-skel file. I’m using this structure on a system with very few daemons running: log-skel.tar.gz. You may want to build one specific to your systems needs. The above script should be run after all file systems are mounted. On a Lenny system this is done by linking the script to /etc/rcS.d/S36log-skel.sh

Persistent logging of “serious” errors

Finally we still want to log “serious” error messages from syslog to persistent storage so they aren’t lost if the system reboots. This is a single rsyslog rule that can be put in the rsyslog.conf file directly or a separate file in the /etc/rsyslog.d directory. I chose the latter: persistent.conf

*.err    /var/persistent.log

Now cross your fingers and reboot. Any messages you see during boot indicating missing log files can be fixed by adding the file to the template archive we extract in the init script above. After a successful reboot you should be able to see that these four directories are tmpfs mount points by executing the mount command. This is the full output on my ALIX system.

/dev/hda2 on / type ext2 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,nosuid,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
tmpfs on /tmp type tmpfs (rw,mode=1777)
tmpfs on /var/log type tmpfs (rw,noexec,nosuid,nodev,mode=755)

We’re interested in lines 5, 6, 11 and 12. Success!

References

Installing Lenny on ALIX 2d3 over Serial Console

In my last post I linked to the howto forge article that gives detailed instructions for installing Debian on a PCEngines WRAP system. After playing with Voyage Linux on my new ALIX system (the successor to the WRAP) I decided that I would be better of going with the stock Debian Lenny (5.0). The cool thing is, I didn’t follow the howto 🙂 Instead I decided to exercise the PXE-boot support in the ALIX bios and learn something new in the process. This system will be a VPN gateway to a management network that I need to access remotely.

The install requires two connections between my laptop (itself running Lenny) and the ALIX system. First the ethernet connection between the two for the PXE-boot and subsequent network install and a null-modem / serial cable to act as a console for the installer. That’s right, no VGA on this thing … old school. Here’s what it looks like:
ALIX-Serial
The red box is just the housing for a retractable ethernet cable.

First off get minicom up and running. My laptop has no serial port so I broke out my new fangled USB one. Grab the ALIX manual and look up the default comport settings: 38400 8N1, flow control = none. I had a problem with minicom having the following failed assertion:

minicom "Assertion `inptr - bytebuf > (state->__count & 7)' failed"

It seems the way to solve this is by setting the LANG environment variable to “C”. You can do this by executing minicom like this:

LANG=C minicom -c on

Plug the ALIX board in now and you should see the BIOS initializing then failing to find a disk to boot from. Reset it and this time when the BIOS is performing the memory test press the ‘S’ key to enter the BIOS settings. Change the serial baud to 9600 and while you’re in the menu enable PXE-boot by pressing ‘E’.

I recommend you change the baud setting because all documentation I found on installing Linux using a serial console used this baud rate and I wanted to keep things consistent. Likely you can chose any rate you want as long as you’re consistent in the settings you chose … YMMV

Next we track down the directions for installing Debian using netboot. This is well documented on the Debian websites but naturally there are a few catches. I’ll cover those here. Specifically netboot doesn’t support serial console installs. First we’ll worry about getting PXE-boot going, then worry about the installer.

I used the tftpd-hpa tftp server as recommended and the CMU bootp server. inetd.conf already had the necessary configuration lines for these two servers, they only need to be uncommented:

tftp           dgram   udp     wait    root  /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot
bootps          dgram   udp     wait    root    /usr/sbin/bootpd        bootpd -i -t 120

Take note of the root directory for the tftp server. This had me scratching my head for a while. The ‘-s’ option on the first line is the root directory for the tftp server (see the man page for more). This is where we extract the the netboot.tar.gz archive. This affects the configuration we’ll use for bootpd. Note the hd option:

# /etc/bootptab: database for bootp server (/usr/sbin/bootpd)
mgmtvpn:
  hd=/:
  bf=pxelinux.0:
  ip=10.1.0.1:
  sm=255.255.255.0:
  sa=10.1.0.2:
  ha=XXXXXXXXXXXX:

hd is set to / since we’ve told the tftp server that /var/lib/tftpboot is it’s root directory. ha needs to be the MAC address of the NIC on the ALIX board you’re using.

Next comes the patch to add serial console support to the syslinux configuration used in the netboot. The lack of serial console in the installer is documented in bug 309223. There’s a patch posted as a work around but it’s for the amd64 installer and has a lot of options we don’t need (the GTK installer won’t do us much good over the serial console). The patch isn’t short so I won’t include it in it’s entirety. It can be downloaded here: installer.diff.gz. Copy this file to the root of the installer directory and apply the patch:

zcat installer.diff.gz | patch -p1

Notice that we’ve set the serial console to 9600 baud just like we did in the ALIX BIOS menu.

From here the installer should work just like it would using VGA. The serial console is slower (though we may be able to speed it up a bit using a higher baud rate) and the Geode CPU is only 500Mhz but the install didn’t take long. Now the last detail: I’m using my laptop to NAT traffic from the ALIX system to my wireless network when doing the install. This isn’t a requirement and if you’ve got a wired network available then you may want to just use that as is.

Next we need to configure some odds and ends specific to the ALIX system. That’s coming up next.

Installing Voyage on new ALIX system

Back around 2005 I was still new to Linux. I had settled into running Debian on my desktop and I needed a new project. At the time I had a crappy DLink router / access point that would get “confused” quite consistently and had to be reset. After a roommate of mine moved out and left behind an old dell Pentium III I decided to replace the DLink. I scrapped together an extra Ethernet card and a Netgear 802.11b PCI card and started messing around. Surprisingly enough, I turned an old PIII desktop into a router / wireless access point.

I can’t remember how I ran across PCEngines but their WRAP single board computer seemed like a fun and significantly more efficient replacement for my PIII access point. Installing Debian on a CF card using debootstrap was pretty straight forward. My wrap system has been routing my network traffic for 3+ years now and has required minimal / no up-keep (except for fixing my own iptables mistakes). There’s even an article on the HowTo Forge now that you can follow step-by-step.

I was always concerned however that the number of disk writes of a general purpose Linux system (pretty much everything in /var) would eventually wear out the CF card. I suppose after 3 years of operation I can say this may not be as big an issue as I first thought. Still, after purchasing another board from PC Engines I decided to install Voyage, a Debian based distro aimed at CF based embedded systems like the ALIX2d3 I’m setting up to be a VPN end point:

alix2d3 single board computer

Installing Voyage is well documented so I won’t repeat it here. You can check out their site for the details. My general impression of Voyage so far is that it’s a bit out of date and that installing Debian directly is likely a better option. The current stable release of Voyage (version 5.2) is still based on Etch so the version of racoon is pretty old … oh yeah and I couldn’t it to boot with Grub but Lilo worked fine.

It’s not all bad though. Voyage has a really cool set-up for minimizing the number of disk writes: they symlink files that need to be writable to a tmpfs. Everything else is mounted read only. It’s also less than half the size of a minimal Debian install which in some circumstances may be important but since 2GB CF cards can be found for less than $20 this is a non-issue.

So after getting Voyage 5.2 up and running I’m going back to a minimal Lenny install using the Voyage kernel like I did on my older WRAP system … pretty much just like in the howto forge article. Maybe I’ll get fancy and mount directories under /var as a tmpfs to minimize disk writes, or even enable SELinux.

ThinkPad x61s UltraBay docking script

UPDATE: I’ve updated this script with a much better design because SELinux wouldn’t let me muck around with X’s tmp files. dock.sh.

Over a year ago I decided to pick up a new ThinkPad, this time the ultra portable x61s. My desktop was aging and I needed mobility more than anything else. I had always intended to get the X6 UltraBay so I could set it up on my desk but I never seemed to get around to it (probably on account of the phobia I developed while trying to get RedHat 9 to suspend and resume on my old T40).

Tonight, after getting my new (well new to me) UltraBay from ebay in the mail I took on setting up a script to run when docking and undocking the laptop. First off, I’m impressed with how well Linux works with the docking station “out of the box”. Even without udev rules to handle the dock/undock event the CD-RW/DVD combo drive hot plugs perfectly (I’m running a vanilla 2.6.31 kernel on Debian Lenny).

The udev rules and docking scripts for this set-up are well documented out there on the web but there were some details I had to assemble from a few different places. This post is mostly to collect the details in one place. If it’s useful to someone else out there even better.

First lets define what we’re trying to do: I’ve hooked up an external monitor to the UltraBase and I want to distribute my desktop across the external monitor and the LVDS display on the laptop when it’s docked. When it’s undocked the laptop should return to using only the LVDS display. We can script all that using a little bit of control logic and some very basic xrandr commands.

Let’s run through getting xrandr working first, then worry about when and how the script should be run. ThinkWiki has some great info for using xrandr to configure an external monitor. But when I ran some of these commands manually, nothing happened?

Turns out that when I first installed Debian on this laptop there wasn’t an external monitor attached (big surprise) so when X was configured it generated a configuration file that couldn’t handle the external monitor. This means xrandr could query properties from the connected monitor just fine, but any attempt to change the configuration did nothing. The command gave a successful status code but nothing happened.

The suggested fix is to attach the second monitor and reconfigure X. On Debian we’d expect this to be done through dpkg:

dpkg-reconfigure xserver-xorg

Which does nothing. Turns out you can tell X to generate a config file directly which does the trick this time:

sudo X -configure

I know nothing about X configuration files so I can’t say why this works but it does because now when we send X commands through xrandr it responds.

Start small by just turning on the external display:

xrandr --output LVDS --auto --output VGA --auto

Your displays may be named a bit differently which you can check using the query option (-q). The external monitor turned on but the screens overlapped in a funny way … progress.

The effect I want is to have my desktop extend across both screens so they’re side by side. xrandr does this without any hassle:

xrandr --output LVDS --mode 1024x768 --output VGA --mode 1024x768 --left-of LVDS

The command above makes the layout pretty obvious: both screens are 1024 by 768 and the VGA screen is positioned to the left of the built in LVDS. Some people want their external screen to have a higher resolution but it’s easy to change the configuration so I’m going for symmetry to start off. That and the larger the screens get the more RAM the video card borrows from the rest of the system.

This is the command we want to run when the laptop is docked, now the command when it’s undocked:

xrandr --output VGA --off

The two commands can be combined into a script which we can pass a parameter, maybe a 1 when the system is being docked, a 0 when it’s undocked. Throw in a case statement and then we can test it:

#!/bin/sh
case $1 in
    0)
        echo "undock" | logger -t $0
        xrandr --output VGA --off
        ;;
    1)
        echo "dock" | logger -t $0
        xrandr --output LVDS --mode 1024x768 --output VGA --mode 1024x768 --left-of LVDS
        ;;
    *)
        echo "unexpected input" | logger -t $0
        exit 1
        ;;
esac
exit 0

Make this file executable and put it in /etc/thinkpad, call it dock.sh

Now to get the system to run the script for us when the laptop is docked and undocked. Debian Lenny uses udev so this is as simple as adding the following in a file in /etc/udev/rules.d/

KERNEL=="dock.0", ATTR{docked}=="1", RUN+="/etc/thinkpad/dock.sh 1"
KERNEL=="dock.0", ATTR{docked}=="0", RUN+="/etc/thinkpad/dock.sh 0"

I named this file 55-thinkpad-local.rules based on someone’s reported success on the linux thinkpad mailing list. The order in which udev rules are run, why the order is important and how to write them is still a mystery to me and will remain that way for now.

Now we put the two together. The log messages that are sent to the syslog should be verified to be sure the script is running right because when the laptop is docked/undocked … the screen layout won’t change! Great right? We can get the error message by redirecting the output of the xrandr commands to syslog too like so:

xrandr --output LVDS --mode 1024x768 --output VGA --mode 1024x768 --left-of LVDS 2>&1 | logger -t $0

The error message tells us that the script “Can’t open display”. Wait, root (the script is run as roor) doesn’t have permission to open the display? This turns out to be some X magic that’s explained on the ThinkWiki Fn-F7 page. The important part is way down at the end of the script where root enumerates the X sessions and then one by one exports the server identifiers and the Xauthority cookie. After this root can change the display all it wants. We’ve gotta include this in the original script and while we’re at it throw in some extra stuff to make it pretty. The final script is here. Works pretty good for me, but YMMV.