xattrs in OE rootfs part 2

This is an update on some work from a previous post. My xattr patches sat on the yocto mailing list waiting for meta-selinux maintainers to review them for about a month. I got a bit testy and even commented to another contributor off list about the possible need to fork given the lack of response from maintainers. Turns out that this contributor was using his gmail account but actually works for the same company as the maintainer. So about 24 hours later the maintainers magically reappeared and merged my patches!

I really have no interest in forking and maintaining meta-selinux so this was a win. Hopefully I didn’t look like too much of a jerk in the process … ¯_(ツ)_/¯ After a quick stint in a feature branch my xattr code was merged into meta-selinux master. This is only a step in the process though. These patches really belong in the openembedded-core repository. They can likely be moved with minimal effort on the technical side. Getting traction for these sorts of fringe features upstream however takes a bunch of time though, as demonstrated by the 2 months it took to get this into meta-selinux.

As a strategy I should probably prove out an additional use case beyond SELinux file labels. I did some work on meta-measured a while back to add basic IMA support. I really want to expand this to support IMA appraisal. I’m not a huge fan of local enforcement policies but this use case is a great example of build-time xattr usage that would demonstrate the utility of my xattr patches. This is all food for thought though at this point.

Purism, the TPM and the FSF

Alright, I’ll come out and say it: I’m optimistic about the Purism / Librem project. The impending release of their first product is really exciting. Their goals are extremely ambitious (the hallmark of a worthwhile project) and despite having come up short of their stated goals and receiving some harsh though justified critique on the web, they’ve still produced a laptop that’s shipping soon. And so my optimism is holding out.

This is my first post on this topic and is mostly introductory: basically why I care about the project and the subject in general. My goal is to develop a series of posts discussing freedom and security (two things that I’m convinced are related) in the context of general purpose computing and the Purism Librem laptop. If you love freedom, free software and computer security, read on … if not, read on anyways.

Criticism from the Web

Not everyone is as impressed with Purism as I am, and the best articulation I’ve found is from Alexandru Gagniuc on the coreboot blog. The TL;DR is that people have been working in the free hardware space for a long time. The impression is that Purism showed up and claimed that they could do in 6 months what others have been working towards for years. Many have taken this as a sign that something is amiss. Those working in OSS see this sort of stuff all the time: someone shows up on your mailinglist with big plans, lots of enthusiasm … and then they figure out how much work it will take and you never hear from them again. I’m hopeful that, despite falling short of their initial goals, the folks at Purism stick around and continue to chip away at their stated goals.

Something that I find particularly troubling is the increasing rhetoric on the Purism website around “fighting for your freedom” all while hedging on past promises. The new video on their homepage with the young cartoon woman flying their flag and weeping cartoon tears for her lost freedoms has set my instinctive negative reaction to over-the-top marketing into high gear. Add to this some genuine and public concerns over the claims made by Purism and I’d say that they’ve got a bit of digging to do if they want to get out of this hole.

Lesson to learn: Grand or unrealistic claims about security, privacy-preserving and software freedom won’t go unnoticed or unchallenged when your target audience knows what’s up. Purism is catering to a niche market. Their target audience is extremely savvy about these things and they’re not going to be shy about “calling bullshit”. Skeptics are going to be skeptical, they’re going to demand proof. This is a good thing.

Continued Optimism

Despite all of this, I’m still optimistic about the project. Even better, the folks at Purism seem to be paying attention to their critics. They’ve published a road map detailing the steps necessary to reach their stated goal of FSF RYF (respects your freedom) certification. No doubt it’s going to be an uphill battle and the folks over at coreboot may be right: the entirety of the firmware may never be OSS. But having an OEM who actually wants to fight for the user will never be a bad thing so long as they don’t alienate their target audience in the process. In a time when OEMs are getting busted for integrating ad/spy-ware into their firmware we need a change.

The Future: Freedom and Security

This is probably going to get dangerously close to an attempt to predict the future, but I’m very hopeful for Purism and their products. I haven’t been following too closely but when I looked the other day, sure enough the specs for the Librem 15 list a TPM and it’s a version 2. Sweet. More on this in a future post.

Additionally, the FSF has endorsed CrowdSupply (the crowd funding platform used by Purism). I’m trying not to read too far into this but I can’t help but think that this represents an organized push from free software movement into hardware while adopting the pro-security-and-privacy rhetoric that’s become so relevant in the wake of the Snowden revelations.

I generally think of myself as a pragmatist when it comes to running purely open source software. This may just be a result of the need to load proprietary / binary firmware to get the wireless card on my laptop to function properly. It also could be related to my love of security technologies and the fact that the FSF has taken such a hard-line in their labeling of the TPM as malicious, a position I’ve always viewed as misinformed. I’m hopeful that the Purism project can help mend the rift between the free software movement and the security technologies that are essential to preserving our freedoms in an increasingly hostile computing environment.

This post was a sort of introduction, just some background about why I care about the Purism project. In my next post I’m hoping to get into why I think freedom and security have become so closely related. Stay tuned.

Talk at Xen Developer Summit 2013

UPDATE: Here’s the link: http://www.youtube.com/watch?v=6Q8mlTBn-ZI. I still haven’t been able to bring myself to actually watch it but I’m sure it’s great :

Just got back from the 2013 Xen Developer Summit where I gave a talk on a few interesting (to me at least) things. If you’re interested you can find my abstract here. My focus was naturally on SELinux / XSM stuff. Mostly my talk focused on the sVirt implementation in XenClient XT and another fun application of the architecture to our management stuff.

Had good chat with a guy from Amazon afterward about all of the other evil stuff someone could do if they compromised QEMU. So while sVirt prevents the specific scenario presented I’ve no doubt there are other hazards. He was specifically concerned over the Xen privcmd driver & the hypercalls it could make. Hard to disagree as QEMU with root permissions in dom0 can execute any hypercall it wants. The only way to address this (other than stubdoms) is to deprivilege QEMU to prevent it from making hypercalls. That would probably require some code-changes in QEMU so it’s no small task.

I also touched briefly on the design for an inter-VM communication (IVC) mechanism that was floated to xen-devel this summer. In XT we have an IVC called ‘V4V’ that isn’t acceptable to upstream. When it came to our XSM policy however V4V had some favorable properties in that we created a new object in the hypervisor that was a ‘first-class’ object in the policy.

The proposal uses the same model as the front/back drivers so there would be no new object specific to the IVC. This means there wouldn’t be way to differentiate the IVC from any other front/back driver. The purpose of the talk was to point this out and hopefully solicit some discussion. Got an even better conversation going on this point so hopefully I’ll have some fun stuff to report on this front soon.

Calculating the MLE hash

My work to calculate PCR[18] from the last post was missing one big piece. I took a short cut and parsed the MLE hash out of the SINIT to MLE data table. This was a stop gap.
The MLE wasn’t being measured directly. We were still extracting the measurement as taken by the SINIT which is a binary blob from Intel. We don’t have a choice in trusting this blob from Intel but we can verify the measurements it takes. With this in mind I’ve gone back and added a tool to the pcr-calc module to calculate the MLE hash directly from the MLE.

The MLE Hash

Calculating the MLE hash is a bit more complicated than just hashing the ELF binary that contains it. There’s already a utility that does this in the tboot project though it’s pretty limited as it only dumps out the hash in a hex string. My end goal is to integrate this work into a bitbake class so having a python class to emit a hash object containing the measurement of the MLE is a lot more convenient.

In the pcr-calc project I’ve added a few things to make this happen. First is a class called mleHeader that parses the MLE header. This is just more of the mundane data parsing that I’ve been doing since this whole thing started. Finding the MLE header is just a matter of searching for the magic MLE UUID: 5aac8290-6f47-a774-0f5c-55a2cb51b642. Having the header isn’t enough though. The MLE must be extracted from the ELF and this is particularly hard because I know nothing about the structure of ELF files.

To do the extracting I basically ported the mlehash utility from tboot to python. The MLE is actually stored in the ELF file program header. This requires parsing and extracting the PT_LOAD segments. Writing a generic ELF parser is way beyond the scope of what I’m qualified to do but thankfully Eli Bendersky already has a handle on this. Check out pyelftools on his github page. You can download the package for pyelftools through the python package system like so:

$ pip install pyelftools

I’ve not yet integrated a check for this package into the pcr-calc autotools stuff yet but I’ll get around to it.

So in pcr-calc, the MLEUtil class does a few things. First it unzips the ELF file if necessary. Second, the ELFFile class from pyelftools is used to extract the PT_LOAD segments from the ELF. These are copied to a temporary file and the excess space is zero-filled. Once the ELF is extracted we locate the MLE header by searching for the UUID above. This header is represented and parsed by the mleHeader object.

The end goal is to calculate the SHA1 hash of the MLE. The fields in the header we need to do this are mle_start_off and mle_end_off. These are the offset to the start and end of the MLE respectively. Both offsets are relative to the beginning of the extracted ELF. The hash is then simply calculated over the data in this range.


With the objects necessary to calculate the MLE hash done I went back and updated the pcr18 utility. Now instead of parsing the hash out of the TXT heap it now hashes the MLE directly. The mlehash program is constructed in a similar way but it is limited to calculating the MLE hash only.


A significant amount of the work in calculating the MLE hash was just code reading, firstly to understand how to extract and measure the MLE, second to understand how use the pyelftools package. Using pyelftools means that pcr-calc has a new dependency but it’s a lot better than implementing it myself. Working with pyelftools has been beneficial not only in that it saves me effort but it’s also an excellent example to work from. pcr-calc is my first attempt at implementing anything in python and it shows. Having poked around in pyelftools a little bit I’ve realized that even though my code “works” it’s pretty horrible. Future efforts to “clean up” pcr-calc will model significant portions of it after the code in pyelftools.

Having completed calculating the MLE hash we’ve taken a big step forward in our effort to construct future PCR values by measuring the individual components. It’s the last step in removing dependence on the extracted heap. We can now calculate PCR[18] and PCR[19] without any knowledge of or access to the deployed platform hardware and that’s pretty great. PCR[17] by contrast contains a whole bunch of stuff like the STM hash that’s independent from the Linux OS being run. For now I’m happy to assume PCR[17] is static for a system and doesn’t need to be calculated in the build system.

Eventually I’d like to extend pcr-calc to include mechanisms for ingesting an LCP and calculating PCR[17] but that’s a long way off. Instead, my next steps will be to clean up the pcr-calc code and integrating it into the meta-measured OE layer. The end goal here is to produce a manifest that a 3rd party (an installer or a remote system) can use to either seal secrets to a future platform state or for appraising an attestation exchange. More on this front next.

Calculating PCR[18] and PCR[19]

In my first post on the subject, I indicated calculating PCR[18] and PCR[19] was significantly easier than PCR[17]. If you read my last post on calculating PCR[17] you’ll see why. Since then I’ve gone through and sorted out calculating the remaining two PCRs and it is in fact pretty easy (few!).


Unlike PCR[17], 18 and 19 are mostly just hashes of boot modules. Unfortunately the MLE Developers Guide again is a bit vague here and maybe a bit misleading even. I’m starting to think that the guide on the Intel website is out of date.

The state of PCR[18] is defined in section 1.9.2. From the text:

PCR 18 will be extended with the SHA-1 hash of the MLE, as reported in the
SinitMleData.MleHash field.

This is true but it’s only the first extend operation. There can be a second and by default (default tboot behavior) there is. The second hash is of the first boot module after the MLE. The is the module tboot executes after the SINIT.

So what’s the MLE (Measured Launch Environment) anyways? The MLE is the thing that does the measured launch. Pretty much it’s the code that executes GETSEC[SENTER] and hands control over to the SINIT ACM, that’s tboot. After a successful measured launch this hash will be in the TXT heap in the SINIT to MLE data table. The field is called the MleHash.

Like I said in my first post: this isn’t about blindly trusting the hashes in the heap though. We want to be able to isolate the thing being measured so if the MLE is tboot then we should be able to take the SHA1 hash of the tboot binary and get the same value … or not. There’s actually a structure within tboot that defines the MLE. The hash of this structure is what’s in the MleHash filed of the SINIT to MLE data table.

There’s already a stand-alone program in the tboot code to calculate this hash for use in the LCP. I’ve not yet looked too deep into how this is calculated though. For now, if you’re interested, you can use this tool to calculate the MleHash and compare this to the data reported in your TXT heap or the txt-stat output. Mine looks like this:

  $ ./lcp_mlehash -c "logging=serial,vga,memory" ~/txt-test/modules/tboot.gz
  5b d5 12 72 1e 07 5e 31 4d 8d e5 2e 5f b9 10 04 d4 00 e7 27

A very important thing to note here is that we pass not only the module to this program but the arguments passed to it as well.

So the MLE Dev Guide indicates that PCR[18] will be extended with the MleHash we’ve just now calculated. But if you check the status of your PCR[18] after a measured launch you’ll realize that it can’t be calculated with just the MleHash. So the MLE Dev Guide isn’t really the best documentation for this stuff I guess. You’re much better off reading the README in the tboot sources.

In the README it’s clearly stated that PCR[18] is extended not only with the MleHash but also with the hash of the first boot module. The bit that’s missing of course is a description of how modules are hashed. The exact algorithm for doing this isn’t in the Dev Guide or the README though, it’s in the tboot source code in the file tboot/common/policy.c.

A module hash is simply the hash of the command line passed two the module concatinated with the module itself (remember the two parameters passed to the lcp_mlehash above). There isn’t much to processing the command line but if you’re interested you can check out the code. I’ve added a utility called module-hash to automate this to the pcr-calc library. Using the notation we’ve been using up till now this could be represented as:

  module_hash = SHA1 (cmdline | module)

You can invoke the utility to perform this calculation like so:

  module-hash --cmdline tboot.cmdline --module tboot.gz

Combine this with the MleHash and the full PCR[18] calculation is as follows:

  PCR[18]_1 = sha1 (PCR[18]_0 | MleHash)
  PCR[18]_2 = sha1 (PCR[18]_1 | sha1 (cmdline | module))

there are two extends required to calculate PCR[18]. And as a reminder: PCR[18]_0 is the PCR before any extend operations so it contains 20 bytes of 0’s. In my example setup the module that’s hashed and extended into PCR[18] after the MLE is just the linux kernel file vmlinuz.

The pcr-calc project has a pcr18 utility too that wraps the MleHash and the hashing of the module for convenience. It gets the MleHash from the heap still but the next logical step is to calculate it directly from the tboot binary and the commandline. Invoke pcr18 like so:

  $ pcr18 --module vmlinuz --cmdline vmlinuz.cmd txtheap.bin

The --cmdline argument is a file containing a single line of text which is parsed as the parameters to the module.


PCR[19] is just as trivial to calculate and again the Dev guide is little help. There’s nothing mentioned of this PCR in the Dev Guide but the LCP dictates what gets hashed and stored in it.
The default tboot policy extends PCR[19] with the hash of all remaining modules. That’s all of the boot modules that aren’t an SINIT ACM, the MLE or the first module.

This leaves the initrd on my system (yours may be different and may have more than one module). If the module is compressed it must be decompressed before it’s hashed. The calculation of PCR[19] can be described as follows:

  PCR[19]_1 = sha1 (PCR[19]_0 | sha1 (cmdline_1 | module_1))
  PCR[19]_n = sha1 (PCR[19]_n-1 | sha1 (cmdline_n | module_n))

Remember that module_0 is tboot so it’s measured as part of PCR[18] so we start here with cmdline_1 and module_1.

The pcr-calc project has yet another program pcr19 that automates the calculation of PCR[19]. The calling convention is a bit awkward here and I’ll probably have to come up with something better:

  $ pcr19 module1.cmd,module1 module2.cmd,module2

The arguments are all positional. They’re ordered pairs of files with the first being a file containing the command line text and the second being the module itself. This means your file paths can’t have commas in them.


When I started out on this quest to pre-calculate the DRTM PCRs as populated by the Intel TXT hardware and tboot I knew this would be painful but I didn’t realize how much. It really was an exercise in masochism. Now that I’ve jumped through the hoops and done the digging necessary to understand the process, I’ve got a pretty good understanding of what’s required to move to the next phase of this project.

My prevous work with OE and the meta-measured layer is the groundwork. It produces a system image that will do a TXT measured launch but the PCR values after boot are still a mystery. I had hoped to be able to calculate all of these measurements in the build, but access to the TXT heap from the target device isn’t realistic. From the work discussed above however, it looks like PCR[18] and PCR[19] can be calculated reliably.

The remaining work is to calculate the MleHash, though the existing tool in tboot is extremely close to what we need. Combined with the tools from pcr-calc this is likely sufficient. All of this will need to be combined and integrated into the meta-measured layer likely as part of an image class. Sounds like my next task is to clean this stuff up and revive meta-measured.

Calculating PCR[17]

When I left off my last discussin of tboot PCR calculations I gave a quick intro but little more. In this post I’ll go into details for calculating the first of them: PCR[17].

There have been a number of discussions with regard to calculating or verifying PCR values on the tboot-devel mailing list and they were extremely useful in writing this code and post. These all fell a bit short of what I wanted to accomplish in that all approaches extracted hashes from the output of the txt-stat program (the tboot log) and used those hashes to re-construct the PCR values. I wanted to construct all hashes manually, to measure and account for the actual things that TXT and tboot were measuring and storing into the PCRs and to do this independent of an actual measured launch. Basically this translates to isolating the things being measured, extracting them (if possible) and use them to reconstruct the PCR value on any system, like a build server or an external verifying party.

The process is pretty straight forward, though time consuming, and the specification is phrased in such a way as to force some guess and check. There’s even a bit of a trick in the end which requires that we go digging around in the tboot source code which is always fun. I’ll also present a bit of code that will automate the calculation for you so if you’re anxious and don’t want to read any more you can go straight to the code which can be found here: .

DISCLAIMER: The code in the pcr-cal git repo is very much a work-in-progress and should be considered unstable at best so YMMV.

The spec that defines the DRTM specific PCRs is the “PC Client Implementation for BIOS”. These are PCRs 17 through 20. Their individual use however is hardware specific and on Intel hardware, the definitive source of data on what gets extended into which of these PCRs as part of establishing a DRTM is a document titled “Intel® Trusted Execution Technology (Intel® TXT) Software Development Guide: Measured Launched Environment Developer’s Guide”.

Quite a mouth full. Anyways section 1.9.1 covers PCR[17] but the details of what various bits are measured are spread out over the document. A default tboot configuration will cause 3 extends to this PCR so we’ll break this post up into 3 sections, each one describing the hashes that go into the 3 consecutive extend operations.

First extend: SINIT ACM

The first thing that’s extended into PCR[17] is the hash of the SINIT ACM. This is a binary blob that Intel ships which is used by tboot to establish the DRTM on a platform. The binary code in the ACM is chipset specific so there are a number of ACMs out there to chose from. tboot automates the process so if you’re unsure which ACM is the right one for your platform you can configure your bootloader to load every ACM and tboot will pick the right one. This will slow your boot process down considerably though and selecting the proper one isn’t hard with a bit of reading so don’t be lazy.

With the right ACM in hand you’d think it would be a simple matter of calculating the sha1 hash of the file and extending that into PCR[17]. That’s not the case though. There are two little details that must be sorted first.

Depending on the version of the ACM you’re using the hash algorithm may be sha1 or it may be sha256. ACMs version 7 or later will use sha256, while earlier versions will use sha1. The current version of the ACM format is 8 so most modern hardware will need a sha256 hash (not to mention that most OEM implementations of TXT 3 years or older never worked in the first place … snap!).

Further, there are some fileds in the ACM that aren’t included in the hash. The logic behind this escapes me but the apendix A.1.2 specifies that some fields are omitted. Quoting the spec: “Those parts of the module header not included are: the RSA Signature, the public key, and the scratch field.” That sounds like 3 fields from the ACM right? Wrong: there’s a 4th field omitted as well and that’s the RSA exponent. I guess they meant for the exponent to be included in the definition of “public key”? Thanks for being explicit.

Anyways omit the fields: RSAPubKey, RSAPubExp, RSASig and the Scratch space, got it. To omit these fields from the hash we’ve gotta parse the ACM. I’ll cover this code at the end.

Finally the 32 bits that make up the EDX register which hold the flags passed to the GETSEC[SENTER] instruction are appended to the hash of the ACM. We represent PCR[17] at the first extend operation thusly:

PCR[17]_1 = Extend(PCR[17]_0 | SHA256 (ACM | EDX))

where PCR[17]_0 is the state of PCR[17] at time = 0. PCRs are initialized to 20bytes of 0’s so PCR[17]_0 is 20 bytes of 0’s.

Second Extend: Heap Data

The second extend to PCR[17] includes various bits of data from the TXT heap. Appendix C describes the TXT heap as a contiguous region of memory set asside by the BIOS for use by ‘system software’ (aka BIOS) to pass data to the SINIT ACM and the MLE. PCR[17] is extended with the sha1 hash of either 6 or 7 concatenated pieces of data depending on the version of the ACM. The following fields are concatenated together and their sha1 hash is extended into PCR[17] for the second extend:

  1. BiosAcmId
  2. MsegValid
  3. StmHash
  4. PolicyControl
  5. LcpPolicyHash
  6. OsSinitCaps or 4 bytes of 0’s as specified by the LCP (more on this next)

If the SINIT to MLE data table version is 8 or greater an additional 4 bytes are appended representing the processor S-CRTM status. These 4 bytes are in the ProcScrtmStatus field in the SINIT to MLE data table.

The second extend to PCR[17] could be represented as follows for SINIT to MLE data table versions < 8:

PCR[17]_2 = sha1 (PCR[17]_1 | sha1 ( SinitMleData.BiosAcm.ID | SinitMleData.MsegValid |
                                     SinitMleData.StmHash | SinitMleData.PolicyControl |
                                     SinitMleData.LcpPolicyHash | (OsSinitData.Capabilities, 0)))

For SINIT to MLE data table versions >= 8:

PCR[17]_2 = sha1 (PCR[17]_1 | sha1 ( SinitMleData.BiosAcm.ID | SinitMleData.MsegValid | 
                                     SinitMleData.StmHash | SinitMleData.PolicyControl |
                                     SinitMleData.LcpPolicyHash | (OsSinitData.Capabilities, 0) |

where we use the notation (OSSinitData.Capabilities, 0) to represent a choce made between appending the value of OsSinitData.Capabilities or 4 bytes of 0’s depending on the state of the LCP policy control field.

The astute reader is likely wondering: “How do I get these values out of the TXT heap … and where do I even get the TXT heap from?” Both are very good questions. Getting at the TXT heap isn’t too difficult. You’re on a Linux system presumably with root access. The TXT heap is a region of memory like any other and you only need to know the offset where it resides and how to determine it’s size. Both the offset and the heap size are obtained from the TXT public registers which are mapped to well known memory addresses (read the spec if you’re really interested).

In the git repo linked above I’ve written a simple utility to parse and output the TXT Heap: txt-heapdump. You can run this utility to display the contents of the heap on the system you wish to calcuclate PCR[17] in a human readable form:

$ txt-heapdump --mmap --pretty

You can also use it to obtain the heap as a binary file:

$ txt-heapdump --mmap > txtheap.bin

You can then parse the binary file to display the heap in a human readable form and it’ll look just like it did coming straight from /dev/mem::

$ txt-heapdump --pretty -i txtheap.bin

Once you have the heap as a file you can use the pcr-calc library to parse and extract various bits. Again, I’ll present the utility that does it all for you at the end. But first, the third and final extend …

Third Extend: Launch Control Policy (LCP)

You’d expect that all of the values that are hashed as part of PCR[17] are discussed in the spec under section “PCR 17” … and you’d be wrong. A couple of sections deeper where the LCP is discussed, you’ll find a description of how the LCP policy is measured and it turns out that this measurement gets extended into PCR[17] as well! There are a number of rules laid out in this section for how the system behaves when there’s no ‘Supplier’ or ‘Owner’ LCP present. Specifically the spec states:

As a matter of integrity, the LCP_POLICY::PolicyControl field will always be extended into PCR 17. If an Owner policy exists, its PolicyControl field will be extended; otherwise the Supplier policy’s will be. If there are no policies, 32 bits of 0s will be extended.

I’ve not gone through this section with a very thorough eye so I’m not an authority here, but tboot seems to ignore these rules and instead loads a default policy when there isn’t one in the TPM NV RAM. Not saying this is good or bad, right or wrong, just pointing out that this is what tboot does and it was something that I had to figure out in order to calculate the value of PCR[17] independently on my test systems.

So my goal here is to calculate PCR values. If your system is like mine and both you (the ‘Owner’) and the ‘Supplier’ (your OEM) didn’t provide an LCP, how do we measure the default policy from tboot? The only thing I could come up with is to pull apart the tboot code and copy the hard-coded structures into a C program and then dump them to disk in binary. The hash of this file is the one we need to extend into PCR[17] along with the LCP PolicyControl value. I’ve added a class to the pcr-calc library to parse the necessary parts of the binary LCP to support this operation.

The program that dumps the binary LCP from tboot is: lcp_def I’ve kept this utility in the pcr-calc project to reproduce the LCP on demand. I considered only keeping around the LCP binary in a data file but in the event that the default tboot policy changes in the future I wanted to keep the program around to dump the binary structures. When executed this program just dumps the binary policy so you’ll have to redirect the output:

$ lcp_def > lcp.bin

Final PCR[17] Calculation

Now that we’ve figured out how to do all three independent extend operations and we’ve collected the heap and LCP blobs, we can calculate the final state of PCR[17]. I’ve automated this in the program: pcr17 (very creative name I know). Assuming your heap is in txtheap.bin and your LCP is in lcp.bin your SINIT ACM file is named sinit.acm you should invoke the program as follows:

$ pcr17 -i txtheap.bin -l lcp.bin sinit.acm

Your output should look something like this:

$ ./bin/pcr17 -i ../txt-data/txtheap.bin -l ../txt-data/lcp_def.bin ../3rd_gen_i5_i7_SINIT_51.BIN 
first extend: SINIT ACM hash
  extending with: 0fcc099f81549da4836d492afb8ab2e303cecfa1
  PCR[17] before extend: 0000000000000000000000000000000000000000
  PCR[17] after extend: 8d3dd5c8e795dfac5dbfa9859310b2bcea36d347
second extend: TXT heap data
  append BiosAcmId:
    8000 0000 2010 1022 0000 b001 ffff ffff 
    ffff ffff 
  append MsegValid_Bytes:
    0000 0000 0000 0000 
  append StmHash:
    0000 0000 0000 0000 0000 0000 0000 0000 
    0000 0000 
  append PolicyControl_Bytes:
    0000 0000 
  append LcpPolicyHash:
    0000 0000 0000 0000 0000 0000 0000 0000 
    0000 0000 
  append Capabilities_Bytes: False
    Hashing 4 bytes of 0s in place of OsSinit.Capabilities
  append ProcScrtmStatus_Bytes:
    0000 0000 
  extending with: 7e0cdad3b8d9c344ab89657efdbfa638d1b25978
  PCR[17] before extend: 8d3dd5c8e795dfac5dbfa9859310b2bcea36d347
  PCR[17]: bfa4421b49f6ab899157ba6ee8fec3c5c5abf4ab
third extend: LCP
  lcp hash: ab41624e7d71f068d48e1c2f43e616bf40671c39
  polctrl: 1
  extending with: 9704353630674bfe21b86b64a7b0f99c297cf902
  PCR[17] before extend: bfa4421b49f6ab899157ba6ee8fec3c5c5abf4ab
  PCR[17] after extend: 57a5f1b245ac52614498a728efe7f741b4dc3ebf

PCR[17] final: 57a5f1b245ac52614498a728efe7f741b4dc3ebf

Currently the program will dump the hashes and PCR states after each extend along with the actual sha1 that should be in PCR[17] after a successful TXT measured launch. Take a look at the code if you’re interested in the details. This was as much an exercise for me in learning a bit of python as it was about the actual end result. If given the choice again I’d have implemented this in C just because it’s much easier to deal with binary values and memory ranges in C than in Python. Then again this may just be that I know C better than I know Python so YMMV.


As you’ve probably noticed I was only partially successful in my goal. All of the data from the TXT heap that are extended in to PCR[17] are themselves hashes of things we can’t access. Most of these hashes are all 0’s though denoting that the BIOS implementer opted out of implementing that feature (you’ll have an STM on your system one of these days but don’t hold your breath). The only one that’s actually present is the BiosAcmId but I’d expect in the future for the other fields to be populated as well.

This is just another instance of binary blobs making their way into the TCB of our software systems. We’ve had to deal with these in various forms over the years: binary drivers, firmware and BIOS code. Intel and other chip manufacturers have been making their hardware extensible using firmware and microcode for a while now so it’s no surprise that these things have made their way into the TCB. The good news is that they’re being measured and even if we can’t get our hands on the code, or even the binary blob on account of it being embedded in some piece of hardware, we can still identify them by their hash. The implications for trust aren’t great but it’s a start.

What’s in a hash?

After the initial work on meta-measured it was very clear that configuring an MLE is great but alone it has little value. Sure tboot will measure things for you, it will even store these measurements in your TPM’s PCRs! But the “so what?” remains unanswered: there are hashes in your TPM, who cares?

Even after you’ve set-up meta-measured, launch an MLE and dumped out the contents of /sys/class/misc/tpm0/device/pcrs what have you accomplished? The whole point of meta-measured was to setup the machinery to make this easier and for the PCR values to remain unchanged across a reboot. I was surprised at how much work went into just this. But after this work, the hashes in these PCRs still had no meaning beyond being mysterious, albeit static, hashes.

I closed the meta-measured post stating my next goal was to take a stab at pre-computing some PCR values. Knowing the values that PCRs will have in your final running system allows for secrets to be protected by sealed storage at install time (which I’ve heard called ‘local attestation’ just to confuse things). Naturally the more system state involved in the sealing operation (assume this means ‘more PCRs’ for now) the better. So I had hoped to come back after a bit with the tools necessary for meta-measured to produce a manifest of as many of the tboot PCR values as possible.

Starting with PCR[17]

Naturally I started with what I knew would be the hardest PCR to calculate: the infamous PCR[17]. JPs comment on my last post pointed out some of his heroic efforts to compute PCR[17] so that was a huge help. So first things first: respect to JP for the pointer. This task would have taken me twice as long were it not for his work and the work of others on tboot-devel.

So I set out to calculate PCR[17] but I think my approach was different from those I was able to find in the public domain. The criteria I came up with for my work was:

  1. Calculate PCR[17] for system A on system B.
  2. Do the measurements myself.

So ‘rule #1’ basically says: no reliance on having a console on the running system. This is one part technical purity, one part good design as the intent is to make these tools as flexible as possible and useful in a build system. ‘Rule #2’ is all technical purity. This isn’t an exercise in recreating the algorithm that produces the value that ends up in PCR[17].

This last bit is important. The whole point is to account for the actual things (software, configuration etc) that are measured as part of bringing up a TXT MLE. Once these are identified they need to be collected (maybe even extracted from the system) if possible, and then used to calculate the final hash stored in PCR[17]. So basically, no parsing and hashing the output from ‘txt-stat’, that’s cheating 🙂 I explained this approach to a friend and was instantly accused of masochism. That’s a good sign and I guess there’s an element of that in the approach as well, if not everything I do.

As always, wrapping up one exploratory exercise in learning / brushing up on a language is always a good idea right? So I did as much of my work as possible on this in Python. Naturally I had to break this rule and use some C at the end but that’s a bit of a punchline so I don’t want to spoil that joke.

So if you’re only interested in the code I won’t bore you with any more talk about ‘goals’ and ‘design’. It’s all up on github. The python’s here: https://github.com/flihp/pcr-calc. The C is here: https://github.com/flihp/pcr-calc_c. There isn’t much in the way of documentation but I’ll get into that soon.

If you are interested in the words that accompany this work stay tuned. My next post will give a bit of a tour of the rabbit hole that is calculating PCR[17]. This will include discussion of each ‘thing’ that’s measured and what it all means. Like I said though: the end result is that precalculating PCR[17] for arbitrary platforms is a massive PITA and likely not very useful for my original purposes. After thinking on it a bit however I’m quite certain this info may be useful elsewhere but I’ll save that for discussion on follow-on work.

Measured Launch on OE core

It’s been 4 months since my last post but I’ve been working on some fun stuff. Said work has progressed to the point where it’s actually worth talking about publically so I’m crawling out from under my favorite rock and putting it “out there”.

My last few bits of writing were about some random OpenEmbedded stuff, basically outlining things I was learning while bumbling my way through the OE basics. I’ve been reading through the meta-selinux and meta-virtualization layers and they’re a great place to learn. Over the winter Holiday here I had some extra vacation time from my day job to burn so I finally got serious about a project I’ve been meaning to start for way too long.


Over the past year I’ve been thinking a lot about the “right way” to measure a software system. We’ve implemented a measurement architecture on XT but this has a few down sides: First a system as large as XT is very difficult to use as a teaching tool. It’s hard to explain and show someone the benefits of measuring a system when your example is large, complex and the relevant bits are spread throughout the whole system. Even our engineers who know our build system inside and out often get lost in the details. Second the code belongs to Citrix and closed source software isn’t very useful to anyone except the people selling it.

So after reading through the meta-selinux and meta-xen layers a bunch and learning a good bit about writing recipes I’ve started work on a reference image for a “measured system”. I’m keeping the recipes that make up this work in a layer I call ‘meta-measured’. For this first post on the topic of measured systems I’ll stick to discussing the basic mechanics of it’s construction. This includes some data on the supporting recipes and some of the component parts necessary for booting it. Hopefully along the way I’ll be able to justify the work by discussing the potential benefits to system security but the theory and architecture discussions will be left for a later post.

get the source

If you’re interested in just building it and playing with the live image this is where you should start. Take a look and let me know what you think. Feedback would be much appreciated.

All of the work I’ve done to get this first bootable image working is up on my github. You can get there, from here: https://github.com/flihp. The ‘meta-measured’ layer is here: https://github.com/flihp/meta-measured.git. To automate setting up a build environment for this I’ve got another repo with a few scripts to checkout the necessary supporting software (bitbake / OE / meta-intel etc), a local.conf (which you may need to modify for your environment), and a script to build the ‘iso’ that can be written to a USB drive for booting a test system: https://github.com/flihp/measured-build-scripts.

The best way to build this currently is to checkout the measured-build-scripts repo:

git clone git://github.com/flihp/measured-build-scripts.git

run the ‘fetch.sh’ script to populate the required git submodules and to clone the meta-measured layer:

cd measured-build-scripts

build the iso

If you try to run the ./build.sh script next as you would think you should, the build will fail currently. It will do so while attempting to download the SINIT / ACM module for TXT / tboot because Intel hides the ACMs behind a legal terms wall with terms that must be accepted before the files can be downloaded. I’ve put the direct link to it in the recipe but the download fails unless you’ve got the right cookie in your browser so wget blows up. Download it yourself from here: http://software.intel.com/en-us/articles/intel-trusted-execution-technology, then drop the zip into your ‘download’ directory manually. I’ve got the local.conf with DL_DIR hardwired to /mnt/openembedded/downloads so you’ll likely want to change this to suit your environment.

Anyway I’ll sort out a way to fool the Intel lawyer wall eventually … I’m tempted to mirror these files since the legal notice seems to allow this but I don’t really have the bandwidth ATM. Once you’ve got this sorted, run the build.sh script. I typically tee the output to a file for debugging … this is some very ‘pre-alpha’ stuff so you should expect to debug the build a bit 🙂

./build.sh | tee build.log

This will build a few images from the measured-image-bootimg recipe (tarballs, cpios, and an iso). The local.conf I’ve got in my build directory is specific to my test hardware so if you’ve got an Intel SugarBay system to test on then you can dump the ISO directly to a USB stick and boot it. If you don’t have a SugarBay system then you’ll have to do some work to get it booting since this measured boot stuff is closely tied to the hardware, though the ACMs I’ve packaged work for 2nd and 3rd gen i5 and i7 hardware (Sandy and Ivy Bridge).


I’ve organized the recipes that make up this work into two categories: Those that are specific to the TPM and those that are specific to TXT / tboot. Each of these two technologies requires some kernel configs so those are separated out into fragments like I’ve found in other layers. My test hardware has USB 3.0 ports which the base OE layers don’t seem to have yet. I’ve included this config in my oe-measured distro just so I can use the ports on the front of my test system.

The TPM recipes automate building the Trousers daemon, libtspi and some user space tools that consume the TSS interface. Recipes for the TPM software are pretty straight forward as most are autotools projects. Some work was required to get the trousers project separated into packages for the daemon and library.

The tboot recipes were a bit more work because tboot packages a bunch of utilites in the main tboot source tree so they had to be separated out into different packages (this work is still on-going). Further tboot doesn’t use autotools and they squash most compiler flags that the OE environment passes in. The compler flags required by tboot are static which stands at odds with OE and a cross-compiled environment that wants to change the path to everything including the compiler.

I’ve no clue if tboot will build properly on anything other than an Intel system. Further the issue of Intel hiding the ACMs required for their chipssets behind an EULA wall is annoying as the default OE fetcher won’t work.


My first instinct is always to to describe a system by construction: from the bottom up. In this case I think going top-down is a better approach so we’ll start with the rootfs and work backwards. The TPM recipes includes two images based on the core-image from OE core. That’s one initramfs image and one rootfs. The rootfs is just the core-image with the TPM kernel drivers, trousers daemon, tpm-tools and the tpm-quote-tools. I haven’t done much with this rootfs other than booting it up and see if TXT and the TPM works as expected.

There’s also an initramfs with the TPM kernel drivers, trousers daemon and the tpm-tools but not the quote tools. This is a very minimal initramfs with the TSS daemon loaded manually in the initrd script. It’s not expected that users will be using the tpm-tools interactively here but that’s what I’ve been doing for initial testing. Only the tpm_extendpcr tool (open source from Citrix) is used to extend a PCR with the sha1sum hash of the rootfs before the call to switch_root. This requires that the ‘coreutils’ package be included just for the one utility which bloats the initramfs unfortunately. Slimming this down should’t be too much work in the future. Anyway I think this is ‘the right way’ to extend the measurement chain from the initramfs up to the rootfs of the system.

The rest of the measruements we care about are taken care of by the components from the TXT recipes. There’s only one image in the TXT recipe group however. This is derived from the OE core live image and it’s intended to be ‘deployable’ in the lanugage of OE recipes. I think this means an hddimg or an ISO image, basically something you can ‘dd’ to disk and boot. Currently it’s the basis for a live image but could easily be used for something like an installer simply by switching out the rootfs.

This image is not a separate root filesystem but instead it’s an image created with the files necessary to boot the system: syslinux (configured with the mboot.c32 comboot module), tboot, acms and the initrd and the rootfs from the TPM recipes. tboot measures the bootloader config, all of the boot modules and a bunch of other stuff (see the README in the tboot sources for details). It stores these measurements in the TPM for us, creating the ‘dynamic root of trust for measurement’ (DRTM).

Once tboot has measured all of the modules, the initramfs takes over. The initramfs then measures the rootfs as described above before the switch to root. I’ve added a few kernel parameters to pass the name of the rootfs and the PCR where it’s measurement is to be stored.

If the rootfs is measured on each boot it must be mounted read-only to prevent its measurement from changing … yup even mounting a journaled file system read-write will modify the journal and change the filesystem. Creating a read-only image is a bit of work so for this first prototype I’ve used a bit of a short cut: I’ve mounted the rootfs read only, create a ramfs read write, then the two are combined in a unionfs. In this configuration when rootfs boots it looks like a read / write mount. Thus on each boot the measurements in the TPM are the same.

Next Steps

Measuring a system is all well and good but who cares? Measurements are only useful when they’re communicated to external parties. For now this image only takes measurements and these measurements are the same on each boot. That’s it. Where this can be most immediately useful is that these measurements can be predicted in the build.

The PCRs 0-7 are reserved for the BIOs and we have no way of predicting these values currently as they’re unique to the platform and that’s messy. The tboot PCRs however (17, 18 and 19 in the Legacy mapping currently used) can be calculated based on the hashing done by tboot (read their docs and http://www.mail-archive.com/tboot-devel@lists.sourceforge.net/msg00069.html). The PCR value containing the measurement of the rootfs can be calculated quite simply as well.

For a reference live image this is interesting only in an academic capacity. As I suggest above, this image can be used as a template for something like an installer which would give the predictability of PCR values much deeper meaning: Consider an installer architecture where the installer itself is a very small rootfs that downloads the install package from a remote server (basically Debian’s netboot iso or a PXE boot setup). Assuming we have a method for exchanging system measurements (more future work) it would be very useful for the remote server to be able to evaluate measurements from the installer before releasing the install package.

This is probably a good place to wrap up this post. The meta-measured layer I’ve described is still very new and the images I’ve built are still usefuly only for ‘tire-kicking’. My next post will hopefully discuss predicting measurement values in the build system and other fun stuffs.

Chrome web sandbox on XenClient

There’s lots of software out there that sets up a “sandbox” to protect your system from untrusted code. The examples that come to mind are Chrome and Adobe for the flash sandbox. The strength of these sandboxes are an interesting point of discussion. Strength is always related to the mechanism and if you’re running on Windows the separation guarantees you get are only as strong as the separation Windows affords to processes. If this is a strong enough guarantee for you then you probably won’t find this post very useful. If you’re interested in using XenClient and the Xen hypervisor to get yourself the strongest separation that I can think of, then read on!

Use Case

XenClient allows you to run any number of operating systems on a single piece of hardware. In my case this is a laptop. I’ve got two VMs: my work desktop (Windows 7) for email and other work stuff and my development system that runs Debian testing (Wheezy as of now).

Long story short, I don’t trust some of the crap that’s out on the web to run on either of these systems. I’d like to confine my web browsing to a separate VM to protect my company’s data and my development system. This article will show you how to build a bare bones Linux VM that runs a web browser (Chromium) and little more.


You’ll need a linux VM to host your web browser. I like Debian Wheezy since the PV xen drivers for network and disk work out of the box on XenClient (2.1). There’s a small bug that required you use LVM for your rootfs but I typically do that anyways so no worries there.

Typically I do an install omitting even the “standard system tools” to keep things as small as possible. This results in a root file system that’s < 1G. All you need to do then is install the web browser (chromium), rungetty, and the xinint package. Next is a bit of scripting and some minor configuration changes.


When this VM boots we want the web browser to launch and run full screen. We don’t want a window manager or anything. Just the browser.

When Linux boots, the init process parses the /etc/inittab file. One of the things specified in inittab are processes that init starts like getty. Typically inittab starts getty‘s on 6 ttys but we want it to start chrome for us. We can do this by having init execute rungetty (read the man page!) which we can then have execute arbitrary commands for us:

# /sbin/getty invocations for the runlevels.
# The "id" field MUST be the same as the last
# characters of the device (after "tty").
# Format:
#  :::
# Note that on most Debian systems tty7 is used by the X Window System,
# so if you want to add more getty's go ahead but skip tty7 if you run X.
1:2345:respawn:/sbin/getty 38400 tty1
2:23:respawn:/sbin/getty 38400 tty2
3:23:respawn:/sbin/getty 38400 tty3
4:23:respawn:/sbin/getty 38400 tty4
5:23:respawn:/sbin/getty 38400 tty5
6:23:respawn:/sbin/rungetty tty6 -u root /usr/sbin/chrome-restore.sh

Another configuration change you’ll have to make is in /etc/X11/Xwrapper.config. The default configuration in this file prevents users from starting X if their controlling TTY isn’t a virtual console. Since we’re kicking off chromium directly we need to relax this restriction:


chromium-restore script

Notice that we have rungetty execute a script for us and it does so as the root user. We don’t want chromium running as root but we need to do some set-up before we kick off chromium as an unprivileged user. Here’s the chrome-restore.sh script:


LAUNCH=$(which chromium-launch.sh)
if [ ! -x "${LAUNCH}" ]; then
	echo "web-launch.sh not executable: ${LAUNCH}"
	exit 1

rsync -avh --delete ${HOMESAFE}/ ${HOMEDIR}/ > /dev/null 2>&1
chown -R ${USER}:${USER} ${HOMEDIR}

/bin/su - -- ${USER} -l -c "STARTUP="${CMD}" startx" < /dev/null
shutdown -Ph now

The first part of this script is setting up the home directory for the user (chromium) that will be running chromium. This is the equivalent of us restoring the users home directory to a “known good state”. This means that the directory located at /usr/share/chromium-clean is a “known good” home directory for us to start from. On my system it’s basically an empty directory with chrome’s default config.

The second part of the script, well really the last two lines just runs startx as an unprivileged user. startx kicks off the X server but first we set a variable STARTUP to be the name of another script: chromium-launch.sh. When this variable is set, startx runs the command from the variable after the X server is started. This is a convenient way to kick off an X server that runs just a single graphical application.

The last command shuts down the VM. The shutdown command will only be run after the X server terminates which will happen once the chromium process terminates. This means that once the last browser tab is closed the VM will shutdown.

chromium-launch script

The chromium-launch.sh script looks like this:


if [ ! -f "${CONFIG}" ]; then
	echo "cannot locate CONFIG: ${CONFIG}"
	exit 1

LINE=$(xrandr -q 2> /dev/null | grep Screen)
WIDTH=$(echo ${LINE} | awk '{ print $8 }')
HEIGHT=$(echo ${LINE} | awk '{ print $10 }' | tr -d ',')

sed -i -e "s&(s+"bottom":s+)-?[0-9]+&1${HEIGHT}&" ${CONFIG}
sed -i -e "s&(s+"left":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"right":s+)-?[0-9]+&1${WIDTH}&" ${CONFIG}
sed -i -e "s&(s+"top":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"work_area_bottom":s+)-?[0-9]+&1${HEIGHT}&" ${CONFIG}
sed -i -e "s&(s+"work_area_left":s+)-?[0-9]+&10&" ${CONFIG}
sed -i -e "s&(s+"work_area_right":s+)-?[0-9]+&1${WIDTH}&" ${CONFIG}
sed -i -e "s&(s+"work_area_top":s+)-?[0-9]+&10&" ${CONFIG}


It’s a pretty simple script. It takes one parameter which is the path to the main chromium config file. It query’s the X server through xrandr to get the screen dimensions (WIDTH and HEIGHT) which means it must be run after the X server starts. It then re-writes the relevant values in the config file to the maximum screen width and height so the browser is run “full screen”. Pretty simple stuff … once you figure out the proper order to do things and the format of the Preferences file which was non-trivial.

User Homedir

The other hard part is creating the “known good” home directory for your unprivileged user. What I did was start up chromium once manually. This causes the standard chromium configuration to be generated with default values. I then copied this off to /usr/share to be extracted on each boot.


So hopefully these instructions are enough to get you a Linux system that boots and runs Chromium as an unprivileged user. It should restore that users home directory to a known good state on each boot so that any downloaded data will be wiped clean. When the last browser tab is closed it will power off the system.

I use this on my XenClient XT system for browsing sites that I want to keep separate from my other VMs. It’s not perfect though and as always there is more that can be done to secure it. I’d start by making the root file system read only and adding SELinux would be fun. Also the interface is far too minimal. Finding a way to handle edge cases like making pop-ups manageable and allowing us to do things like control volume levels would also be nice. This may require configuring a minimal window manager which is a pretty daunting task. If you have any other interesting ways to make this VM more usable or lock it down better you should leave them in the comments.

sVirt in XenClient

It’s been 5 months since my last post about my on-going project required by my masters program at SU. With the hope of eventually getting my degree, this is my last post on the subject. In my previous post on this topic I described a quick prototype I coded up to test an example program and SELinux policy to demonstrate the sVirt architecture. This was a simple example of how categories from the MCS policy can be used to separate multiple instances of the same program. The logical step after implementing a prototype is coding up the real thing so in this post I’ll go into some detail describing an implementation of the sVirt architecture I coded for the XenClient XT platform. While it may have taken me far too long to write up a description of this project, it’s already running in a commercial product … so I’ve got that going for me 🙂


XenClient is a bit different than the upstream Xen in that the management stack has been completely rewritten. Instead of the xend process which was written in python, XenClient uses a toolstack that’s rewritten in Haskell. This posed two significant hurdles. First I’ve done little more than read the first few pages from a text book on Haskell so the sVirt code, though not complex, would be a bit over my skill level. Second SELinux has no Haskell bindings which would be required by the sVirt code.

Taking on the task of learning a new functional programming language and writing bindings for a relatively complex API in this language would have taken far longer than reasonable. Though we do intend to integrate sVirt into the toolstack proper, putting this work on the so called “critical path” would have been prohibitively expensive. Instead we implemented the sVirt code as a C program that is interposed between the toolstack and the QEMU instances it is intended to separate. Thus the toolstack (the xenmgr process) invokes the svirt-interpose program each time a QEMU process needs to be started for a VM. The svirt-interpose process then does all of the necessary functions to prepare the environment for the separation of the QEMU instance requested from the others currently running.

The remainder of this document describes the svirt-interpose program in detail. We begin by describing the interfaces down the call chain between the xenmgr, svirt-interpose and QEMU.
We then go into detail describing the internal workings of the svirt-interpose code. This includes the algorithm used to assign categories to QEMU processes and to label the system objects used by these processes. We conclude with a brief analysis of the remaining pieces of the system that could benefit from similar separation. In my first post on this topic I describe possible attacks we’re defending against so I’ll not repeat that here.

Call Chain

As we’re unable to integrate the sVirt code directly into the toolstack we must interpose the execution of the sVirt binary between the toolstack and QEMU. We do this by having the toolstack invoke the sVirt binary and then have sVirt invoke QEMU after performing the necessary SELinux operations. For simplicity we leave the command line that the toolstack would normally pass to QEMU unchanged and simply extract the small piece of information we need from it in the sVirt code. All sVirt requires to do it’s job is the domain id (domid) of the VM it’s starting a QEMU instance for. This value is the first parameter so extracting it is quite simple.

The final bit that must be implemented is in policy. Here we must be sure that the policy we write reflects this call chain explicitly. This means removing the ability for the toolstack (xend_t) to invoke QEMU (qemu_t) directly and replacing this with allowing the toolstack to execute the svirt-interpose program (svirt_t) while allowing the svirt-interpose domain to transition to the QEMU domain. This is an important part of the process as it prevents the toolstack from bypassing the svirt code. Many will find protections like this superfluous as it implies protections from a malicious toolstack and the toolstack is a central component of the systems TCB. There is a grain of truth in this argument though it represents a rather misguided analysis. It is very important to limit the permissions granted to a process to limit a possible vulnerability even if the process we’re confining is largely a “trusted” system component.

Category Selection

The central piece of this architecture is to select a unique MCS category for each QEMU process and assign this category to the resources belonging to said process. Before a category can be assigned to a resource we must first chose the category. The only requirement we have when selecting categories is that they are unique (not used by another QEMU process).
Thus there is no special meaning in a category number. Thus it makes sense to select the category number at random.

We’re presented with an interesting challenge here based on the nature of the svirt-interpose program. If this code was integrated with the toolstack directly it would be reasonable to maintain a data structure mapping the running virtual machines to their assigned categories. We could then select a random category number for a new QEMU instance and quickly walk this in-memory structure to be sure this category number hasn’t already been assigned to another process. But as was described previously, the svirt-interpose code is a short lived utility that is invoked by the toolstack and dies shortly after it starts up a QEMU process. Thus we need persistent storage to maintain this association.

The use of the XenStore is reasonable for such data and we use the key ‘selinux-mcs’ under the /local/domain/$domid node (where $domid is the domain id of a running VM) to store the value. Thus we randomly select a category and then walk the XenStore tree examining this key for each running VM. If a conflict is detected a new value is selected and the search continues. This is a very naive algorithm and we discuss ways in which it can be improved in the section on future work.

Object labeling

Once we’ve successfully interposed our svirt code between the toolstack and QEMU and implemented our category selection algorithm we have two tasks remaining. First we must enumerate the objects that belong to this QEMU instance and label them appropriately. Second we must perform the steps necessary to ensure the QEMU process will be labeled properly before we fork and exec it.

Determining the devices assigned to a VM by exploring the XenStore structures is tedious. The information we begin with is the domid of the VM we’re configuring QEMU for. From this we can examine the virtual block devices (VBDs) that belong to this VM but the structure in the VM specific configuration space rooted at /local/domain/$domid only contains information about the backend hosting the device. To find the OS objects associated with the device we need to determine the backend, then examine the configuration space for that backend.

We begin by listing the VBDs assigned to a VM by enumerating the /local/domain/$domid/device/vbd XenStore directory. This will yeild a number of paths in of the form /local/domain/$domid/device/vbd/$vbd_num where $vbd_num is the numeric id assigned to a virtual block device. VMs can be assigned any number of VBDs so we must process all VBDs listed in this directory.

From these paths representing each VBD assigned to a VM we get closer to the backing store by extracting the path to the backend of the split xen block driver. This path is contained in the key /local/domain/$domid/device/vbd/$vbd_num/backend. Once this path is extracted we check to see if the device in dom0 is writable by reading the ‘mode’ value. If the mode is ‘w’ the device is writable and we must apply the proper MCS label to it. We ignore read only VBDs as XenClient only assigns CDROMs as read only, all hard disks are allocated as read/write.

Once we’ve determined the device is writable we now need to extract the dom0 object (usually a block or loopback device file) that’s backing the device. The location of the device path in XenStore depends on the backend storage type in use. XenClient uses blktap processes to expose VHDs through device nodes in /dev and loopback devices to expose files that contain raw file systems. If a loopback device is in use the path to the device node will be stored in the XenStore key ‘loop-device’ in the corresponding VBD backend directory. Similarly if a bit more cryptic, the device node for a blktap device for a VHD will be in the XenStore key ‘params’.

Once these paths have been extracted the devices can be labeled using the SELinux API. To do so, we first need to know what the label should be. Through the SELinux API we can determine the current context for the file. We then set the MCS category calculated for the VM on this context and then change the file context to the resultant label. Important to note here is that both a sensitivity level and a category must be set on the security context. The SELinux API doesn’t shield us from the internals of the policy here and even though the MCS policy doesn’t reason about sensitivities there is a single sensitivity defined that must be assigned to every object (s0).

Assigning a category to the QEMU process is a bit different. Unlike file system objects there isn’t an objct that we can query for a label. Instead we can ask the security server to calculate the resultant label of a transition from the current process (sVirt) to the destination process (QEMU). There is an alteernative method available however and this allows us to deterine the type for the QEMU process directly. SELinux has added some native support for virtualization and one such bit was the addition of the API call ‘selinux_virtual_domain_context_path’. This function returns the path of a file in the SELinux configuration directory that contains the type to be assigned to domains used for virtualization.

Once we have this type the category calculated earlier is then applied and the full context is obtained. SELinux has a specific API call that allows the caller to request the security server apply a specific context to the process produced by the next exec performed by the calling process (setexeccon). Once this has been done successfully the sVirt process cleans up the environment (closes file descriptors etc) and execs the QEMU program passing it the unmodified command line that was provided by the toolstack.


Applying an MCS category to a QEMU process and its resources is fairly straight forward task. There are a few details that must be attended to to ensure that proper error handling is in place but the code is relatively short (~600 LOC) and quite easy to audit. There are some places where the QEMU processes must overlap however. XenClient is all about multiplexing shared hardware between multiple virtual machines on the same PC / Laptop. Sharing devices like the CD-ROM that is proxied to clients via QEMU requires some compromise.

As we state above the CD-ROM is read-only so an MCS category is not applied to the device itself but XenClient must ensure the accesses to the device are exclusive. This is achieved by QEMU obtaining an exclusive lock on a file in /var/lock before claiming the CD-ROM. All QEMU processes must be able to take this lock so the file must be created without any categories. This may seem like a minor detail but it’s quite tedious to implement in practice and it does represent path for data to be transmitted from one QEMU process to another. Transmission through this lock file would require collusion between QEMU processes so it’s considered a minimal threat.

Future Work

This is my last post in this series that has nearly spanned a year. I’m a bit ashamed it’s taken me this long to write up my masters project but it did end up taking on a life of its own getting me a job with Citrix on the XenClient team. There’s still a lot of work to be done and I’m hoping to continue documenting it here. Firstly I have to collect the 8 blog posts that document this work and roll them up into a single document I can submit to my adviser to satisfy my degree requirements.

In parallel I’ll be working all things XenClient hopefully learning Haskell and integrating the sVirt architecture directly into our toolstack. Having this code in the toolstack directly will have a number of benefits. Most obviously it’ll remove a few forks so VM loading will be quicker. More interestingly though it will open up the possibility of applying MCS category labeling to devices (both PCI and USB) that are assigned to VMs. The end goal, as always, is strengthening the separation between the system components that need to remain separate thus improving the security of the system.