TCG TSS2 async APIs and event driven programming

UPDATE 2018-07-03: Example code ported to latest (2.0) release of the tpm2-tss and tpm2-abrmd.

API design is hard. Over the past year I’ve been contributing to an effort to standardize a set of APIs for interacting with TPM2 devices through the Trusted Computing Group’s (TCG) TPM2 Software Stack (TSS) working group (WG). The available APIs and implementations predate my involvement but I’m hoping to contribute some use cases to motivate the need for various aspects of the API and architecture. According to Joshua Bloch use-cases are core to API design so I figure having some thoughts and examples documented and available would be helpful.

Use-case driven design: asynchronous function calls

After watching Mr. Bloch’s talk a few times in the last year, the API design principle that’s stuck with me most is that the design should be motivated / driven by use cases. A use-case adopted by the TSS WG that I’ve found particularly interesting is support for event driven programming environments / frameworks. On several occasions this use case has come into question and generally the argument against the feature comes from the trade-offs involved: detractors argue that the utility of and demand for the use case does not outweigh the complexity that it introduces to the API.

But even though event driven programming is integral to many languages and frameworks, we don’t have any examples of applications using the TPM in such an environment. There aren’t many applications using the TPM to begin with and since the TSS for TPM 1.2 devices never supported async I/O (AFAIK) there was never an opportunity to do so. Since the TSS2 APIs include support for asynchronous operations I thought it was worth while to write some example code to demonstrate use of this part of the API in a popular programming framework.

You know what they say:
if-you-build-it

Integrating SAPI with GLib and GIO

The TSS2 APIs are limited to C. While some bindings to higher level languages have been emerging I’m expecting that C will be the primary language for interacting with the TPM for some time. There is no shortage of options for event frameworks in C, but for the purposes of this post I’m going to focus exclusively on GLib and the GIO libraries.

Before we dive any deeper though I want to include a quick note on event driven programming in general: The term “event driven” doesn’t necessarily mean asynchronous or even non-blocking. Further, not all asynchronous I/O operations are truly asynchronous. Many depend on polling file descriptors with some help from the kernel through the select or poll system calls. I don’t intend to get into the merits or purity of the various approaches though. In this post we’re just using the tools available in a way that works with GLib and GIO. If you’re interested in a “deep-dive” into event driven programming in C there are much better resources out there: start here

GLib and GIO primitives

At the heart of the GLib event model are 3 objects: the GMainLoop, the associated GMainContext and the GSource. These objects are very well documented so I’m going to assume general familiarity. For our purposes we only need to know a few things:

  1. GMainLoop is a thread that processes events. If used properly the GMainLoop will handle the asynchronous I/O operations so we don’t have to polling or threading code ourselves.
  2. GMainContext is a container used by the GMainLoop. It holds a collection of GSources that make up the events in the GMainLoop.
  3. GSource is an object that holds a number of function pointers that are invoked in response to some event.

Let’s do a quick example to illustrate before we mix in the TSS2 stuff. The following code is an example of a GMainLoop with a single timeout event source. The event source will invoke a callback after a specified time. The timeout event will occur on the interval so long as the callback returns G_SOURCE_CONTINUE.

#include <glib.h>
#include <stdlib.h>

#define TIMEOUT_COUNT_MAX 10
#define TIMEOUT_INTERVAL  100

typedef struct {
    GMainLoop *loop;
    size_t timeout_count;
} data_t;

gboolean
timeout_callback (gpointer user_data)
{
    data_t *data = (data_t*)user_data;

    g_print ("timeout_count: %zu\n", data->timeout_count);
    if (data->timeout_count < TIMEOUT_COUNT_MAX) {
        ++data->timeout_count; 
        return G_SOURCE_CONTINUE;
    } else {
        g_main_loop_quit (data->loop);
        return G_SOURCE_REMOVE;
    }
}
int
main (void)
{
    GMainContext *context;
    GSource *source;
    data_t data = { 0 };

    source = g_timeout_source_new (TIMEOUT_INTERVAL);
    g_source_set_callback (source, timeout_callback, &data, NULL);
    context = g_main_context_new ();
    g_source_attach (source, context);
    g_source_unref (source);

    data.loop = g_main_loop_new (context, FALSE);
    g_main_context_unref (context);

    g_main_loop_run (data.loop);
    g_main_loop_unref (data.loop);
    return 0;
}

In ~40 LOC we have an event loop triggering a timeout on a fixed interval. After a fixed number of timeout events the program shuts itself down cleanly. Not bad for C. It’s worth noting as well that GLib has some convenience functions (in this case g_timeout_add) that would have saved us a few lines here but I’ve opted to do those bits manually to make future examples more clearly relatable to this one.

Event-driven TSS2 function calls

Now that we have the basic tools required to use a GMainLoop for asynchronous events we’ll play around with adapting them for use with asynchronous calls in the SAPI library. We’ll make this next step a small one though. We won’t go straight to creating GObjects to plug into the GMainLoop. Instead let’s take a look at *how* the TSS2 libraries expose the primitives we need to make async calls and *what* those primitives are.

Asynchronous TSS2

The TCG TSS2 APIs all work together to provide the mechanisms necessary to make asynchronous calls. Each layer in the stack has a purpose and an asynchronous mechanism for carrying out that purpose. From the perspective of an application, the lowest layer is the TCTI library. This API is extremely small and provides transmit / receive functions as an interface to the underlying IPC mechanism. This API can be used by higher layers to send TPM2 command buffers and receive the associated response without having to worry about the underlying implementation.

If the underlying IPC mechanism used by the TCTI library is capable of asynchronous operation then the TCTI may expose non-blocking behavior to the client. The TCTI API provides only one function capable of non-blocking I/O: The `receive` function can be passed a timeout value that will cause the calling thread to block for some time waiting for a response. If no response is received before the timeout then the caller will be returned a response code instructing them to “try again”. If this timeout is provided a timeout value of TSS2_TCTI_TIMEOUT_NONE then the function will not block at all. This is about as simple as non-blocking I/O can get.

Additionally, TCTI libraries provide the getPollHandle function. This function returns the TSS2_TCTI_POLL_HANDLE type which we map to an asynchronous I/O mechanism for various platforms. On Linux it defined as struct pollfd. On Windows it would make sense to map it to the HANDLE type.

But the TSS2_TCTI_POLL_HANDLE isn’t intended for use by the typical developer. Instead it’s intended for use as a mechanism to integrate the TSS2 stack into event driven frameworks. So if the TCTI mechanism works as advertised we should be able to use this type to drive events in the GLib main loop.

Asynchronous SAPI function calls

The mechanisms from the TCTI layer are necessary for asynchronous interaction with TPM2 devices but alone they’re not sufficient. The layers above the TCTI in the TSS2 architecture must participate by providing asynchronous versions of each function. Using the SAPI as an example, each TPM2 function is mapped to a SAPI function call. But the SAPI provides not only a synchronous version of the function call (let’s use Tss2_Sys_GetCapability in this example), but also an asynchronous version.

For our the TPM2 GetCapability function the SAPI header provides:

  1. Tss2_Sys_GetCapability – A synchronous function that will send a command to the TPM2 and block until a response is ready to be passed back to the caller.
  2. Tss2_Sys_GetCapability_Prepare – A function to prepare / construct the TPM2 command buffer from the function parameters before they’re transmitted to the TPM.
  3. Tss2_Sys_GetCapability_Complete – A function to convert the response received by the _ExecuteFinish from the byte stream representation to C structures.

Additionally the utility functions are required:

  1. Tss2_Sys_ExecuteAsync – A generic function to send the previously prepared TPM2 command buffer to the TPM. It will only call the underlying TCTI transmit function.
  2. Tss2_Sys_ExecuteFinish – A generic function to receive the TPM2 response buffer (the response to the command previously sent). The second parameter to this function is a timeout value that can be set to non-blocking.

These functions provide us with all we need to create the TPM2 command buffer, transmit it, receive the response and transform the response buffer into C structures that we can use in our program.
The event mechanism is able to trigger an event for us when there’s data ready on the struct pollfd from the underlying TCTI module.

Combining GSource and the TSS2

We’ve effectively enumerated the tools at our disposal. Now we put them together into a very minimal example. Our goal here isn’t to get perfect integration with the GLib object model, just to use the available tools to hopefully demonstrate the concept.

The TCTI layer and the TSS2_TCTI_POLL_HANDLE type mapping to the struct pollfd was the big hint in this whole thing. Using the pollfd structure we can extract the underlying file descriptor and use this well known IPC interface to learn of events from the fd. But before we set off to implement a GSource to drive events for the SAPI let’s see if there’s an existing one that we might use. File descriptors (fds) are pretty common on *nix platforms so it makes sense that there may already be a tool we can use here.

Sure enough the GLib UNIX specific utilities and integration provides us with just this. All we must do is provide the g_unix_fd_source_new function with the fd from the underlying TCTI while requesting the event be triggered when the G_IO_IN (input data ready) condition becomes true for the fd.

NOTE: Before we get too far though it’s important to point out that the TCTI we use for this example is libtcti-tabrmd. This TCTI requires the tpm2-abrmd user space resource management daemon. The other TCTIs for communicating with the kernel device driver and the TPM2 emulator do not support asynchronous I/O. The kernel driver does not support asynchronous I/O. We use a few utility functions for creating the TCTI and SAPI contexts.

The header:

#include <tss2-tpm2-types.h>
#include <tss2-tcti-tabrmd.h>
    
/*  
 * Allocate and initialize an instance of the TCTI module associated with the
 * tpm2-abrmd. A successful call to this function will return a TCTI context
 * allocated by the function. It must be freed by the caller.
 * A failed call to this function will return NULL.
 */ 
TSS2_TCTI_CONTEXT*
tcti_tabrmd_init ();
    
/*  
 * Allocate and initialize an instance of a SAPI context. This context will be
 * configured to use the provided TCTI context. A successful call to this
 * function will return a SAPI context allocated by the function. It must be
 * freed by the caller.
 * A failed call to this function will return NULL.
 */
TSS2_SYS_CONTEXT*
sapi_init_from_tcti_ctx (TSS2_TCTI_CONTEXT *tcti_ctx);

/*
 * Allocate and initialize an instance of a TCTI and SAPI context. The SAPI
 * context will be configured to use a newly allocated instance of the tabrmd
 * TCTI. A successful call to this function will return a TCTI and SAPI
 * context allocated by the function. Both must be freed by the caller.
 * A failed call to this function will return NULL.
 */
TSS2_SYS_CONTEXT*
sapi_init_tabrmd (void);

And the implementation:

#include <errno.h>
#include <glib.h>
#include <inttypes.h>
#include <stdlib.h>
#include <string.h>

#include <tss2/tss2_tpm2_types.h>
#include <tss2/tss2-tcti-tabrmd.h>
    
#include "context-util.h"

TSS2_TCTI_CONTEXT*
tcti_tabrmd_init ()
{   
    TSS2_RC rc;
    TSS2_TCTI_CONTEXT *tcti_ctx;
    size_t size;
    
    rc = tss2_tcti_tabrmd_init (NULL, &size);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to get allocation size for tabrmd TCTI context: "
                    "0x%" PRIx32, rc);
        return NULL;
    }
    tcti_ctx = calloc (1, size);
    if (tcti_ctx == NULL) {
        g_critical ("Allocation for TCTI context failed: %s",
                    strerror (errno));
        return NULL;
    }
    rc = tss2_tcti_tabrmd_init (tcti_ctx, &size);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to initialize tabrmd TCTI context: 0x%" PRIx32,
                    rc);
        free (tcti_ctx);
        return NULL;
    }
    return tcti_ctx;
}
TSS2_SYS_CONTEXT*
sapi_init_from_tcti_ctx (TSS2_TCTI_CONTEXT *tcti_ctx)
{
    TSS2_SYS_CONTEXT *sapi_ctx;
    TSS2_RC rc;
    size_t size;
    TSS2_ABI_VERSION abi_version = TSS2_ABI_VERSION_CURRENT;

    size = Tss2_Sys_GetContextSize (0);
    sapi_ctx = (TSS2_SYS_CONTEXT*)calloc (1, size);
    if (sapi_ctx == NULL) {
        g_critical ("Failed to allocate 0x%zx bytes for the SAPI contextn",
                    size);
        return NULL;
    }
    rc = Tss2_Sys_Initialize (sapi_ctx, size, tcti_ctx, &abi_version);
    if (rc != TSS2_RC_SUCCESS) {
        g_critical ("Failed to initialize SAPI context: 0x%xn", rc);
        free (sapi_ctx);
        return NULL;
    }
    return sapi_ctx;
}
TSS2_SYS_CONTEXT*
sapi_init_tabrmd (void) {
    TSS2_SYS_CONTEXT  *sapi_context;
    TSS2_TCTI_CONTEXT *tcti_context;

    tcti_context = tcti_tabrmd_init ();
    if (tcti_context == NULL) {
        return NULL;
    }
    sapi_context = sapi_init_from_tcti_ctx (tcti_context);
    if (sapi_context == NULL) {
        free (tcti_context);
        return NULL;
    }
    return sapi_context;
}

By combining all of this we can create a simple program that makes an asynchronous SAPI function call. At a high level our goal is to execute a TPM2 command without blocking. When the TPM2 device sends our application a response we want the GLib event loop to invoke our callback so that we can do something with the response. We accomplish this with roughly the following steps:

  1. Create the TCTI & SAPI contexts.
  2. Create a GSource for the fd associated with the TCTI using the g_unix_fd_source_new GSource constructor.
  3. Connect the GSource with the GMainContext and register a callback.
  4. Create the GMainLoop and associate it with the proper context.
  5. Make the asynchronous SAPI function call.
  6. Start the GMainLoop.

When data is ready in the TCTI our callback will be invoked and it will do the following:

  1. Finish the SAPI function call.
  2. Dump out some data returned from the SAPI function call.
  3. Quit the main loop (will terminate the program).

Here’s the code:

#include <glib.h>
#include <glib-unix.h>
#include <inttypes.h>

#include <tss2/tss2_sys.h>
    
#include "context-util.h"

typedef struct { 
    TSS2_SYS_CONTEXT *sapi_context;
    GMainLoop *loop;
} data_t;
    
gboolean
fd_callback (gint fd,
             GIOCondition condition,
             gpointer user_data)
{
    data_t *data = (data_t*)user_data;
    TSS2_RC rc;
    TPMI_YES_NO more_data;
    TPMS_CAPABILITY_DATA capability_data = { 0 };

    rc = Tss2_Sys_ExecuteFinish (data->sapi_context, 0);
    rc = Tss2_Sys_GetCapability_Complete (data->sapi_context,
                                          &more_data,
                                          &capability_data);
    g_print ("Capability: 0x%" PRIx32 "\n", capability_data.capability);
    g_print ("Capability data command count: %" PRIu32 "\n",
             capability_data.data.command.count);
    g_main_loop_quit (data->loop);
    return G_SOURCE_REMOVE;
}
int
main (void)
{
    GMainContext *context;
    GSource *source;
    TSS2_TCTI_CONTEXT *tcti_context;
    TSS2_RC rc;
    TSS2_TCTI_POLL_HANDLE poll_handles[1];
    data_t data;
    size_t poll_handle_count;

    /* setup TCTI & SAPI contexts */
    tcti_context = tcti_tabrmd_init ();
    g_assert (tcti_context != NULL);
    data.sapi_context = sapi_init_from_tcti_ctx (tcti_context);
    g_assert (data.sapi_context != NULL);
    /* get fds to poll for I/O events */
    rc = tss2_tcti_get_poll_handles (tcti_context,
                                     poll_handles,
                                     &poll_handle_count);
    g_assert (rc == TSS2_RC_SUCCESS);
    g_assert (poll_handle_count == 1);
    if (!g_unix_set_fd_nonblocking (poll_handles[0].fd, TRUE, NULL)) {
        g_error ("failed to set fd %d to non-blocking", poll_handles[0].fd);
    }
    /* setup GLib source to monitor this fd */
    source = g_unix_fd_source_new (poll_handles[0].fd, G_IO_IN);
    context = g_main_context_new ();
    g_source_attach (source, context);
    /* setup callback */
    g_source_set_callback (source, (GSourceFunc)fd_callback, &data, NULL);
    data.loop = g_main_loop_new (context, FALSE);
    g_main_context_unref (context);
    /* make initial get capability call */
    rc = Tss2_Sys_GetCapability_Prepare (data.sapi_context,
                                         TPM2_CAP_COMMANDS,
                                         TPM2_CC_FIRST,
                                         TPM2_MAX_CAP_CC);
    g_assert (rc == TSS2_RC_SUCCESS);
    rc = Tss2_Sys_ExecuteAsync (data.sapi_context);
    g_assert (rc == TSS2_RC_SUCCESS);
    /* run main loop */
    g_main_loop_run (data.loop);
    g_main_loop_unref (data.loop);
    /* cleanup TCTI / SAPI stuff */
    Tss2_Sys_Finalize (data.sapi_context);
    g_free (data.sapi_context);
    tss2_tcti_finalize (tcti_context);
    g_free (tcti_context);

    return 0;
}

Conclusion & Next Steps

Pretty straight forward all things considered. But this demonstration leaves a few things to be desired: Firstly no application should have to dig directly into the TCTI to pull out the underlying fd. Instead we should have a function to create a GSource from the SAPI context directly much like g_unix_fd_source_new`

Secondly, there is little in the program to convince us that the function call is truly asynchronous. To make our point we need additional tasks that the GMain loop can execute in the background while the TPM is handling our command. A GTK+ application with a chain of slow TPM2 functions that create a key hierarchy or perform some slow RSA key function (encrypt) while the UI remains responsive would be much more convincing. This may be the topic of a future post if I don’t get distracted.

Finally: this is a lot of machinery and it often comes under attack as adding too much complexity to the API. Instead I’d argue that complexity here is the minimum complexity necessary to enable easy integration of the TSS2 APIs into event driven programming frameworks. Most programmers will never have to interact with this machinery. Instead they should be used to create higher level APIs / Objects that integrate into these event-driven programming frameworks.

TPM2.0-TSS “1.0” release

I had hoped to cut this 1.0 release last week but we received a few last minute contributions that fixed a number of stability issues and a buffer overflow in the resourcemgr. So better late than never, the 1.0 release of the TPM2.0-TSS code has been tagged!

As part of the release I went through and did some maintainer B.S. that was long overdue: adding an AUTHORS and MAINTAINERS file, as well as updating the CHANGELOG and porting it over to markdown (so it’s now CHANGELOG.md).

So now the work on the next release begins … I’m tempted to make a pile of resolutions outlining where I want to focus my efforts initially but given the hectic schedule I know how inaccurate and difficult to live up to they will be. So for now it’s just forward progress on all fronts!

TPM2 Response Codes

For those of you struggling to use the TPM2 SAPI (system API) I’ll put the TLDR here hopefully to save you time and suffering:

  • Decoding TPM2 response codes is a PITA
  • I wrote a tool to make it a bit easier on you. It’s called tpm2_rc_decode and you can find it in the tpm2.0-tools repo.

For those if you interested in the back-story and some general information about TPM2 response codes: read on.

A maintainers priorities

I’ve come to realize that the single most important thing (to me) when working with a new library or tool is having a simple / easy debugging cycle. In the most simple case, like a libc function, returning an integer error code that’s easy to understand is crucial. On Linux, the standard man pages and the obligatory ‘return value’ section are usually sufficient.

So when I took over maintenance of the TPM2.0-TSS project a few months back the first thing I did was try to write a simple program to use the most basic parts of the API. The program I wrote called a single function from the TSS API (Tss2_Sys_GetCapabilities) but of course it didn’t work initially. Worse yet the return value that I got from the function call wasn’t something that could be decoded with a simple look up.

This triggered a sort of revelation where I realized that we’ve made it nearly impossible for people to create meaningful bug reports. Back in June, someone reported the same issue on the tpm2.0-tools project: it’s extremely time consuming to decode TPM response codes by hand. This in turn means that neither the TPM2.0-TSS, or any of the projects consuming it will get good bug reports from their users. Generally this mean that they won’t have many users.

This is a pretty well bounded problem and seemed like an “easy win” that would solve two problems at the same time: make using the TPM2 TSS easier for users, and make for higher quality bug reports from said users.

TPM2 response code encoding

TPM2 response codes (hence forth RCs) aren’t simple integers. They’re unsigned 32bit integers with a pile of information encoded in them. The complexity has a purpose though: a single RC will tell you which part of the TPM software stack (TSS) produced the RC (from the TPM all the way up to the high level APIs), whether it’s an RC from the 1.2 or the 2.0 spec, which format the RC is in, the severity of the RC (warning vs error) and even which parameter caused the RC. Oh yeah, and what the actual error / warning code is. That’s a lot of data in my book.

The hardest part of decoding these RCs is tracking down data on the format and all of the other bits. A bit of reading and searching will turn up the following:

  • Part #1 of the spec (architecture) has a good overview and flow chart for decoding RCs in section 39.
  • Part #2 of the spec has the gory details on the bit fields and what they mean in section 6.6.
  • The TSS spec documents the response code layer indicators and the TSS specific RCs in section 6.1.2 (NOTE: this will change in the next iteration of the spec)

For the sake of keeping this post as accurate as possible for as long as possible I won’t reproduce much data from the specifications. That’s what they’re for. Instead I’ll keep things limited to discussion of the tool that I wrote, some of the major annoyances that I encountered and some of the work I’m doing upstream to fix things.

tpm2_rc_decode

The algorithm for decoding RCs in part 1 of the spec was a good starting point but it’s not sufficient. It omits the details around decoding RCs generated by software outside of the TPM. For this I had to account for the ‘layer indicator’ from the RC. The augmented algorithm is documented in the commit message for the tm2_rc_decode.

Once this algorithm was implemented and documented the tool mostly becomes an exercise in looking up strings in tables that map some integer value (the error code in bits 0 through 5 or 6, or the layer identifier in bits 16 through 23). I had hoped to be able to automate the generation of these tables from the specification but parsing a PDF is a pain in the ass. I’ll probably end up posting more on automating code generation from the TPM2 specs in the future so I won’t say much here.

The neglected TSS spec

This was really my first bit of work that required I dig into the RC portion of the TSS specification. When I started this work it was in pretty bad shape. The structure and definition of all of the TSS RCs and their meaning was well done but the language used in this part of the spec was horribly inconsistent. It interchanged terms like ‘error code’ with ‘response code’ and ‘return code’ seemingly at random. It played similar games with terms like ‘error level’ and ‘layer indicator’.

Now I’ve been called pedantic in the past. And I won’t argue with that label. Attention to detail is a hobby of mine. And I’m of the opinion that, when it comes to technical specifications, pedantry is a virtue. When you’re trying to pick up concepts like this with data densely packed into a single unsigned 32bit integer exactness is paramount.

There’s nothing worse (in my mind) than having loosely defined terms causing confusion between people new to the spec. As a maintainer it’s hard enough to figure out what noobs mean when they report a bug. If the terms they get from the specification can mean any number of things we’re increasingly the likelihood of miscommunication and frustration.

Conclusion

So my first contribution to the TSS specification was the complete rewriting of the RC section. The values of the constants have remained the same but we now use consistent language for ‘layer indicator’, ‘layer shift’ etc and we’ve removed uses of terms like ‘error level’ and ‘error code’ in favor of ‘layer’ and ‘response’.

If you’re developing software using the SAPI, or using the tpm2.0-tools I highly encourage you to use the tpm2_rc_decode utility. It should make your life a lot easier and the lives of those you may end up communicating with when you’re trying to debug your code.

Finally this tool isn’t perfect. None of the RCs that are generated by the resource manager are handled yet so there’s plenty of room for improvement. If you’ve got the cycles and you’re sufficiently motivated I’d gladly take patches to improve the tool.

TPM2 software stack maintainership

My last post left some work in progress on TPM2 event long handling hanging. Sadly the state of that work hasn’t progressed since. An now for the excuse: I’ve taken over maintainership of the TPM2.0-TSS project.

If you’ve ever taken over a large project with minimal time to plan for the “hand-off” then you’ll understand how that can derail pretty much every other technical project that you may have in progress. So I’ve had to back-burner the Grub / event log stuff for the short term. My focus these days is reverse engineering the internals of the TPM2.0-TSS and generally trying to make sense of the thousands of pages of relevant specs.

On the upside this is a great opportunity to get some much needed documentation and refactoring done for the TSS source tree. As the previous maintainer told me, the documentation for this code is the TPM specifications. This is synonymous with “no documentation” in my mind because while the TPM2 specs describe the functioning of a TPM and the associated software stack should behave it says nothing about how the source code is structured. This is the proverbial up hill battle that comes with taking over a “legacy software” project.

I’ve got a few posts in my backlog of lessons learned from trying to get my brain wrapped around this project. Up till now I’d just been making sure things built well and easily on Linux platforms and so my knowledge of the guts of the infrastructure was limited. Now that I’m neck deep in it I’ll do my best to get some lessons learned up here on the blog and unblock the writers block I’ve been suffering.

The first hurdle I had to overcome was making sense of the TPM2 response codes and their inherent complexity. After that I’m hoping to pick apart some of the unit tests that I’ve been pushing these last few weeks since those are mostly how I’m learning about the structure of the source code.

TPM2 UEFI Measurements and Event Log

UPDATE: I’ve continued this work as “pure” UEFI shell executables and so some of the links to this work with Grub2 have gone dead. The updated version of this work can be found here: https://twobit.org/2019/05/21/uefi-tpm2-examples/. Work to integrate these tools into Grub2 is still on-going.

Has it really been over 6 months since I last wrote something on the blog? Crap. The standard excuses apply: busy busy busy. I got sucked into the OSS TPM2 software stack project @ Intel in a huge way.

I screwed up something fundamental along the way though: I moved on to this new project before finishing off an older project. So I’m here to set things right. The OpenXT summit is in a week so I’m cleaning out my backlog and it’s time to write up the stuff I was working on back before I fell down the TPM2 TSS hole. Come to think of it I’ve got some interesting stuff to write about the TPM2 TSS work but first things first: TPM2 in EFI and Grub2.

This is likely the first post in what will become a series on the subject. I’ll try to keep this post to an overview of the TPM2 EFI protocol, the basics of what I’ve been adding to grub and I’ll do a slightly deeper dive in to the pre-boot TPM2 event log that’s maintained by the TPM2 EFI driver / firmware. Future posts will pick up the remaining bits and get into the details of how this was integrated into meta-measured.

TPM2 UEFI recap

I left off my last post describing some oddities I ran into in the TCG TPM 2.0 EFI Protocol specification. With that issue out of the way I spent some time playing around with the functions exposed by the protocol (in the UEFI sense) and what they would allow us to do in the pre-boot environment. Naturally there’s a purpose here: measuring stuff.

Before we get too far into the post here’s a pointer to the code I’ve been developing: . It’s been a few months since I seriously slowed down work on this to focus on the TSS for Linux but I intend to revive it so that I can finish it off in the near future. Intel has already demonstrated this code @ the TCG event @ RSA 2016 so it’s functional but still just ‘demo quality’.

The Protocol

The stuff we want to measure here is the code and the data that was run as part of booting our system. In this case we want to extend the measurement chain from the firmware up through Grub2 to the OS. Despite past efforts on a Grub2 fork known as “trusted grub” the majority of Linux systems have a gap between measurements taken by the firmware and measurements that are taken by kernel or user space code.

In our efforts to fill this gap, let’s start by taking a quick look at the functions that are exposed by the TPM2 UEFI protocol. In UEFI the concept of a ‘protocol’ isn’t what I typically think of when I hear the word. When I hear ‘protocol’ the first thing that comes to mind is a network protocol like TCP or IP. In the context of UEFI it’s a bit different: it defines the interface between UEFI applications and some set of services offered by the UEFI runtime.

The interaction model is very basic: You provide the UEFI runtime with a UUID identifying the protocol you want to use and it will return to you a structure. This structure is protocol specific and it’s effectively a table of function pointers. For the TPM2 there’s 7 commands and thus 7 entries in this structure, one for each function.

The TPM2 EFI Protocol Specification documents this structure and the functions it exposes in section 6. My implementation pulls the structure directly from the spec as follows:

typedef struct tdEFI_TCG2_PROTOCOL {
  EFI_TCG2_GET_CAPABILITY                      GetCapability;
  EFI_TCG2_GET_EVENT_LOG                       GetEventLog;
  EFI_TCG2_HASH_LOG_EXTEND_EVENT               HashLogExtendEvent;
  EFI_TCG2_SUBMIT_COMMAND                      SubmitCommand;
  EFI_TCG2_GET_ACTIVE_PCR_BANKS                GetActivePcrBanks;
  EFI_TCG2_SET_ACTIVE_PCR_BANKS                SetActivePcrBanks;
  EFI_TCG2_GET_RESULT_OF_SET_ACTIVE_PCR_BANKS  GetResultOfSetActivePcrBanks;
} GRUB_PACKED EFI_TCG2_PROTOCOL;

Each of these types is a function with a prototype / parameters etc just like any other function in C. The spec has all of the gory details, including the updated data about the unpacked structure returned by GetCapability. But given our stated goals above: measuring stuff, we really need just one command: HashLogExtendEvent.

This is probably a good time to reflect on how great UEFI is. In the PC BIOS world we would be writing assembly code to interact with the TPM using memory mapped IO or something. In UEFI we call a function to get a table of pointers to other functions and we then call one of these functions. All of this can be done in C. Pretty great IMHO. And if you’re working in the Grub2 environment Grub already has wrappers for invoking the UEFI command to get access to the protocol structure and to invoke functions from it.

On my Minnowboard Max (MBM) this is pretty easy. Add an FTDI serial cable (the one on sparkfun is way over priced but really nice) and you don’t even need a monitor. Anyways, back to it.

HashLogExtendEvent

The TPM is a pretty complicated little thing but it’s fundamental function: extending data into PCRs, is pretty simple. All you need is a buffer holding data, the size of the buffer, and a data structure describing the extend event. This last bit of data is used by the UEFI firmware to update the log of extend events (aka: “stuff we’ve measured”).

In my grub code I’ve broken this down into two functions. The first is a thin wrapper over the HashLogExtendEvent function that takes the data buffer,size and EFI_TCG2_EVENT structure as a parameter. The second is more of a convenience function that builds up the EFI_TCG2_EVENT structure for the caller using “standard” values. The two prototypes are as follows:

grub_err_t
grub_tpm2_extend (EFI_TCG2_PROTOCOL *tpm2_prot,
                  grub_efi_uint64_t  flags,
                  char              *data,
                  grub_efi_uint64_t  data_size,
                  EFI_TCG2_EVENT    *event)
grub_err_t
grub_tpm2_extend_buf (grub_uint8_t      *buf,
                      grub_efi_uint64_t  buf_len,
                      const char        *desc,
                      grub_uint32_t      pcr)

The EFI_TCG2_EVENT structure is pretty simple but it’s annoying to calculate the size values for each call so the convenience function is a good way to keep code sizes manageable. Take a look at the code here if you’re interested in the details.

When Grub does something like load a module of code, or a command from a config file or a kernel / initrd image we measure it into a TPM PCR using the grub_tpm2_extend_buf function providing it the data to hash, the length of the memory buffer holding said data, a brief description of the event (like “linux kernel” or “loadable module”) and the PCR we want to extend with the measurement.

Probably worth noting is that the current implementation uses the string obtained from the grub.cfg file in the Event field. For now this is a debugging mechanism. For a deployed implementation taking input directly from the user (the grub.cfg file should be considered as such) is pretty risky. Instead it’s probably better to use a generic string that describes what the input data is, but doesn’t contain the data directly.

TPM2 Preboot Event Log

It’s very difficult to extract meaning from the contents of a TPMs PCR. Up till now we’ve covered the bits necessary to extend data into a PCR from within Grub. We can extend value after value into it and at some time in the future read it back. But when someone looks at the value they’ll likely want to recreate it to verify every component that went into generating the final value. The name of the UEFI protocol function that extends a value into the PCR is particularly descriptive because, like it says, it also updates the preboot measurement log. This log is where we go to get the data we need to understand our PCR values.

Being able to parse this log and view the contents is also an integral part of verifying or debugging the work I’ve been doing. In this vein, after I managed to get a call to extend a value into a PCR doing more than returning an error code (yeah it happens a lot to me for some reason) I immediately wrote a bit of code to parse and dump / “prettyprint” the audit log.

Grub has a built in shell and an interface to load code modules and commands. This seemed like the “right” mechanism to load commands that I could execute on demand. If you boot grub without a config file you’ll land immediately at the shell and so I found myself debugging by running grub and then executing commands to extend arbitrary data into the log and then parsing the log and dumping it to the console output. If you’ve got a serial console hooked up you can capture this output using minicom or whatever. The commands that parse the log can be found in a module I’m calling tpm2cmd until I come up with a better name. This module has a few commands in it but to dump the event log the tpm2dumplog command is what you’ll need.

The code that walks through the event log data structure is pretty boring. It’s tightly packed so there’s a bit of work to be done in calculating offsets but it’s not rocket surgery. The interesting part is in interpreting the log and figuring out what to do with the data.

Let’s capture an audit log using these new commands. First we need to be able to boot grub and get to the grub shell. I’ve been integrating this work into the meta-measured OpenEmbedded meta-layer so if you want to see this at work you can build the core-image-tpm image or download one from my meta-measured autobuilder. If you want to build Grub by hand and install it on a thumb drive the results should be the same. Just grab the code from my github. I won’t cover the details for this here though.

Assuming you’ve got the latest core-image-tpm image from meta-measured built, just dd the ISO on to a thumb drive and boot it (on a UEFI system with a TPM2 of course). If you’re grabbing the ISO from my autobuilder the target MACHINE is an MBM running the 32bit firmware so YMMV on any other platform. When you finally get to the grub menu hit ‘c’ to get a shell. From here the tpm2cmd module provides a few functions (tpm2dumpcaps tpm2dumplog and tpm2extend) that you can execute to play around. But before we get too far into that it’s important to note that the firmware can support either the new TPM2 event log format or the old TPM 1.2 format. The capabilities structure will tell you which your firmware supports. On the Minnowboard Max I only get the 1.2 foramt:

grub> tpm2dumpcaps
TPM2 Capabilities:
  Size: 0x1c
  StructureVersion:
    Major: 0x01
    Minor: 0x00
  ProtocolVersion:
    Major: 0x01
    Minor: 0x00
  HashAlgorithmBitmap: 0x00000003
    EFI_TCG2_BOOT_HASH_ALG_SHA1: true
    EFI_TCG2_BOOT_HASH_ALG_SHA256: true
    EFI_TCG2_BOOT_HASH_ALG_SHA384: false
    EFI_TCG2_BOOT_HASH_ALG_SHA512: false
    EFI_TCG2_BOOT_HASH_ALG_SM3_256: false
  SupportedEventLogs: 0x00000001
    EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2: true
    EFI_TCG2_EVENT_LOG_FORMAT_TCG_2: false
  TPMPresentFlag: 0x01 : true
  MaxCommandSize: 0x0f80
  MaxResponseSize: 0x0f80
  ManufacturerID: 0x494e5443
  NumberOfPcrBanks: 0x00000000
  ActivePcrBanks: 0x00000000

The tpm2dumplog function is smart enough to check this structure before walking the log. It will prefer the TPM2 event log format if both are supported. There’s even a --format option so you can select which you want if both are supported but I haven’t had a chance to test this since I’ve not yet found a system with firmware supporting the 2.0 format. Booting straight into the grub shell and executing the tpm2dumplog command produces the log file below. Let’s pick a few example entries and see if we can figure out what they mean.

grub> tpm2
Possible commands are:

 tpm2dumpcaps tpm2dumplog tpm2extend tpm2sendlog
grub> tpm2dump
Possible commands are:

 tpm2dumpcaps tpm2dumplog
grub> tpm2dumplog 
TPM2 EventLog
  start: 0x79373000
  end: 0x79373c98
  truncated: false
prettyprint_tpm12_event at: 0x79373000
  PCRIndex: 0
  EventType: EV_S_CRTM_VERSION (0x00000008)
  digest: 1489f923c4dca729178b3e3233458550d8dddf29
  EventSize: 2
  Event: 
prettyprint_tpm12_event at: 0x79373022
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 76bd373351e3531ae3e2257e0b07951a2f61ae42
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x79373052
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: f46823e31fcb3059dbd9a69e5cc679a4465ea318
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x79373082
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 87ce3bc3cd17fe797cc04d507c2f8f3b6c418552
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x793730b2
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 8725eb98a49d6227459c6c90699c5187c5f11e16
  EventSize: 16
  Event: 
prettyprint_tpm12_event at: 0x793730e2
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 57cd4dc19442475aa82743484f3b1caa88e142b8
  EventSize: 53
  Event: a????
prettyprint_tpm12_event at: 0x79373137
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 9b1387306ebb7ff8e795e7be77563666bbf4516e
  EventSize: 36
  Event: a????
prettyprint_tpm12_event at: 0x7937317b
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 9afa86c507419b8570c62167cb9486d9fc809758
  EventSize: 38
  Event: a????
prettyprint_tpm12_event at: 0x793731c1
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 5bf8faa078d40ffbd03317c93398b01229a0e1e0
  EventSize: 36
  Event: ?:=?E????geo
prettyprint_tpm12_event at: 0x79373205
  PCRIndex: 7
  EventType: EV_EFI_VARIABLE_DRIVER_CONFIG (0x80000001)
  digest: 734424c9fe8fc71716c42096f4b74c88733b175e
  EventSize: 38
  Event: ?:=?E????geo
prettyprint_tpm12_event at: 0x7937324b
  PCRIndex: 7
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937326f
  PCRIndex: 1
  EventType: EV_EFI_HANDOFF_TABLES (0x80000009)
  digest: ed620fde0f449cec32fc0eaa040d04e1f1888b25
  EventSize: 24
  Event: 
prettyprint_tpm12_event at: 0x793732a7
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: aae4f77a57d91bf7beeee06e053a73eec78cc9ec
  EventSize: 62
  Event: a????
prettyprint_tpm12_event at: 0x79373305
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: d2ab4e0ed4185d5c846fb56980907a65aa8d3103
  EventSize: 168
  Event: a????
prettyprint_tpm12_event at: 0x793733cd
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 7c7e40269acd5fb401b6f49034b46699bc5c5777
  EventSize: 136
  Event: a????
prettyprint_tpm12_event at: 0x79373475
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: abb0830c9bef09a2c108ceba8f0ff8438991b20a
  EventSize: 206
  Event: a????
prettyprint_tpm12_event at: 0x79373563
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 79d23fb4ea34f740725783540657c37775fb83b0
  EventSize: 239
  Event: a????
prettyprint_tpm12_event at: 0x79373672
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: b8d59f3cde8b1fc5b89ff0b61f7096903734accb
  EventSize: 117
  Event: a????
prettyprint_tpm12_event at: 0x79373707
  PCRIndex: 5
  EventType: EV_EFI_VARIABLE_BOOT (0x80000002)
  digest: 22d22e14e623d1714cd605fef81e114ad8882d78
  EventSize: 121
  Event: a????
prettyprint_tpm12_event at: 0x793737a0
  PCRIndex: 5
  EventType: EV_EFI_ACTION (0x80000007)
  digest: cd0fdb4531a6ec41be2753ba042637d6e5f7f256
  EventSize: 40
  Event: Calling EFI Application from Boot Option
prettyprint_tpm12_event at: 0x793737e8
  PCRIndex: 0
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937380c
  PCRIndex: 1
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373830
  PCRIndex: 2
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373854
  PCRIndex: 3
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x79373878
  PCRIndex: 4
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x7937389c
  PCRIndex: 5
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x793738c0
  PCRIndex: 6
  EventType: EV_SEPARATOR (0x00000004)
  digest: 9069ca78e7450a285173431b3e52c5c25299e473
  EventSize: 4
  Event: 
prettyprint_tpm12_event at: 0x793738e4
  PCRIndex: 5
  EventType: EV_EFI_ACTION (0x80000007)
  digest: b6ae9742d3936a4291cfed8df775bc4657e368c0
  EventSize: 47
  Event: Returning from EFI Application from Boot Option
prettyprint_tpm12_event at: 0x79373933
  PCRIndex: 4
  EventType: EV_EFI_BOOT_SERVICES_APPLICATION (0x80000003)
  digest: 2d13d06efd0330decd93f992991b97218410688d
  EventSize: 64
  Event: ?kx
prettyprint_tpm12_event at: 0x79373993
  PCRIndex: 4
  EventType: EV_EFI_BOOT_SERVICES_APPLICATION (0x80000003)
  digest: e0fcc7f88f096737f8a95d7a86e7a4b33c365ebc
  EventSize: 145
  Event: Pqx
prettyprint_tpm12_event at: 0x79373a44
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 5c88780f029068d6f5863b943575556d7b98c558
  EventSize: 76
  Event: Grub2 command: serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
prettyprint_tpm12_event at: 0x79373ab0
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 392dc11bf1eea9e5e933e0e401fba80fda90f912
  EventSize: 28
  Event: Grub2 command: default=boot
prettyprint_tpm12_event at: 0x79373aec
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: b993a2e236f21b72a05e401d241a5e7617367738
  EventSize: 26
  Event: Grub2 command: timeout=10
prettyprint_tpm12_event at: 0x79373b26
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: f6afd73c21afe5a5136e7fdf7068d71fb4fc014b
  EventSize: 148
  Event: Grub2 command: menuentry boot {
linux /vmlinuz LABEL=boot root=/dev/ram0 console=ttyS0,115200 console=ttyPCH0,115200 console=tty0 
initrd /initrd
}
prettyprint_tpm12_event at: 0x79373bda
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 02775bb98cd27a98178eee72a1cd66bec285c839
  EventSize: 158
  Event: Grub2 command: menuentry install {
linux /vmlinuz LABEL=install-efi root=/dev/ram0 console=ttyS0,115200 console=ttyPCH0,115200
console=tty0 
initrd /initrd
}
prettyprint_tpm12_event at: 0x79373c98
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 69dd51ec4450a2e7b6f3da83370998908aa3370e
  EventSize: 27
  Event: Grub2 command: tpm2dumplog
grub>

The fields here are documented in the specification (version 1.22 of the TPM EFI Protocol spec this time though and in section 3.1.3) so I won’t cover them in detail. Instead let’s look at the first two events:

prettyprint_tpm12_event at: 0x79373000
  PCRIndex: 0
  EventType: EV_S_CRTM_VERSION (0x00000008)
  digest: 1489f923c4dca729178b3e3233458550d8dddf29
  EventSize: 2
  Event: 
prettyprint_tpm12_event at: 0x79373022
  PCRIndex: 0
  EventType: EV_EFI_PLATFORM_FIRMWARE_BLOB (0x80000008)
  digest: 76bd373351e3531ae3e2257e0b07951a2f61ae42
  EventSize: 16
  Event: 

Not a lot of data in these entries. Notably the ‘Event’ field is blank (this is where a textual description of the event should be). The first EventType is the telling bit though (that and that it’s the first thing measrued). This is the first measurement and thus the CRTM, well the CRTM version number at least? The second is much more generic: just some blob of firmware from the platform. What I do find puzzling though is that the first event is listed as 2 bytes long and the second as 16. This length is of the textual event data, the description that looks to be missing. What’s really happening here is that my code treats this field as a NULL terminated string, and the first byte is NULL so the code just doesn’t print anything. This is probably something worth investigating in the future (note to self).

This provides us with an interesting view of the boot process. Neither of these measurements (or the string of events that land in PCR[0]) have much meaning from this angle but the MBM firmware can be built from the EDK2 sources so it may be that with enough code reading we could divine where these values came from. The best thing we have to go on (without additional cooperation from the platform firmware provider) is the EventType and the PCR index where the data was extended. The TCG PC Client Platform Firmware Profile defines “PCR Usage” in section 2.2.4 and PCR[0] is for “SRTM, BIOS, Host Platform Extensions, Embedded Option ROMs and PI Drivers” so basically “firmware”.

For the code that measures the bits that grub loads and depends upon (modules and configuration data) we use PCRs 8 and 9. According to the PC Client PlatformPlatform Firmware Profile 8 and 9 are “Defined for use by the Static OS” which I guess includes the bootloader. According to convention (passed down to me by word of mouth) the even numbered PCRs are where binary data gets hashed and the odd PCRs are for configuration data. So I’m using PCR[8] for Grub modules and binary blobs (kernel, initrd) loaded by grub through the linux command. PCR[9] is where we extend the configuration data loaded by Grub and the commands that Grub executes for the user when they’re in the Grub shell.

The first event in the log for PCR[9] recorded by the core-image-tpm image from meta-measured is the command to set up the serial port. This comes straight from the grub.cfg produced by the openembedded-core recipe:

prettyprint_tpm12_event at: 0x79373a44
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: 5c88780f029068d6f5863b943575556d7b98c558
  EventSize: 76
  Event: Grub2 command: serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1

And you can verify this is the correct value on the console if you want to check my hashing logic:

$ echo -n "serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1" | sha1sum
5c88780f029068d6f5863b943575556d7b98c558  -

Transferring the Pre-Boot Event Log to the Kernel

What’s conspicuously missing though are any measurements of Grub’s components (they would be in PCR[8]). The reason for this: building the TPM2 image from meta-measured produces an “embedded” Grub2 configuration with all modules built directly into the EFI executable so no modules to load. So what about the kernel and initrd you ask? In this example I’ve booted into the Grub shell so we haven’t booted a kernel yet. If we do boot the kernel we end up in a weird situation. You’d expect to see the following entries (that I’ve caused on my system manually by executing linux (hd0,msdos2)/vmlinux and a similar command for the initrd):

prettyprint_tpm12_event at: 0x79373e13
  PCRIndex: 9
  EventType: EV_IPL (0x0000000d)
  digest: bfbd2797ce79ccf12c5e1abbf00c10950e12a8db
  EventSize: 42
  Event: Grub2 command: linux (hd0,msdos2)/vmlinuz
prettyprint_tpm12_event at: 0x79373e5d
  PCRIndex: 8
  EventType: EV_IPL (0x0000000d)
  digest: f50f32b01c3b927462c179d24b6ab8018b66e7bc
  EventSize: 40
  Event: Grub2 Linux initrd: (hd0,msdos2)/initrd

But once the kernel takes over the preboot event log is destroyed. The TPM2 spec is different from the 1.2 version in that there isn’t an ACPI table defined for the preboot audit log. It lives only in the UEFI runtime and when Grub calls the ExitBootServices function the event log is destroyed. So all of this code to measure various bits of Grub will produce data that the kernel and user space will never be able to analyze unless we come up with a mechanism to transfer the log to the kernel. This is surprisingly more difficult than I was hoping.

The two solutions I’ve come up with / run across are:

  1. Create a new UEFI config table and copy the contents of the audit log into it.
  2. Have the kernel copy the table before calling ExitBootServices itself, which requires that Grub be patched to not call this function itself (an idea I got from a conversation with Matthew Garrett).

Both of these approaches have their merits and their drawbacks. We’ll discuss those in more detail in the next post.

Measured Launch on OE core

It’s been 4 months since my last post but I’ve been working on some fun stuff. Said work has progressed to the point where it’s actually worth talking about publically so I’m crawling out from under my favorite rock and putting it “out there”.

My last few bits of writing were about some random OpenEmbedded stuff, basically outlining things I was learning while bumbling my way through the OE basics. I’ve been reading through the meta-selinux and meta-virtualization layers and they’re a great place to learn. Over the winter Holiday here I had some extra vacation time from my day job to burn so I finally got serious about a project I’ve been meaning to start for way too long.

meta-measured

Over the past year I’ve been thinking a lot about the “right way” to measure a software system. We’ve implemented a measurement architecture on XT but this has a few down sides: First a system as large as XT is very difficult to use as a teaching tool. It’s hard to explain and show someone the benefits of measuring a system when your example is large, complex and the relevant bits are spread throughout the whole system. Even our engineers who know our build system inside and out often get lost in the details. Second the code belongs to Citrix and closed source software isn’t very useful to anyone except the people selling it.

So after reading through the meta-selinux and meta-xen layers a bunch and learning a good bit about writing recipes I’ve started work on a reference image for a “measured system”. I’m keeping the recipes that make up this work in a layer I call ‘meta-measured’. For this first post on the topic of measured systems I’ll stick to discussing the basic mechanics of it’s construction. This includes some data on the supporting recipes and some of the component parts necessary for booting it. Hopefully along the way I’ll be able to justify the work by discussing the potential benefits to system security but the theory and architecture discussions will be left for a later post.

get the source

If you’re interested in just building it and playing with the live image this is where you should start. Take a look and let me know what you think. Feedback would be much appreciated.

All of the work I’ve done to get this first bootable image working is up on my github. You can get there, from here: https://github.com/flihp. The ‘meta-measured’ layer is here: https://github.com/flihp/meta-measured.git. To automate setting up a build environment for this I’ve got another repo with a few scripts to checkout the necessary supporting software (bitbake / OE / meta-intel etc), a local.conf (which you may need to modify for your environment), and a script to build the ‘iso’ that can be written to a USB drive for booting a test system: https://github.com/flihp/measured-build-scripts.

The best way to build this currently is to checkout the measured-build-scripts repo:

git clone git://github.com/flihp/measured-build-scripts.git

run the ‘fetch.sh’ script to populate the required git submodules and to clone the meta-measured layer:

cd measured-build-scripts
./fetch.sh

build the iso

If you try to run the ./build.sh script next as you would think you should, the build will fail currently. It will do so while attempting to download the SINIT / ACM module for TXT / tboot because Intel hides the ACMs behind a legal terms wall with terms that must be accepted before the files can be downloaded. I’ve put the direct link to it in the recipe but the download fails unless you’ve got the right cookie in your browser so wget blows up. Download it yourself from here: http://software.intel.com/en-us/articles/intel-trusted-execution-technology, then drop the zip into your ‘download’ directory manually. I’ve got the local.conf with DL_DIR hardwired to /mnt/openembedded/downloads so you’ll likely want to change this to suit your environment.

Anyway I’ll sort out a way to fool the Intel lawyer wall eventually … I’m tempted to mirror these files since the legal notice seems to allow this but I don’t really have the bandwidth ATM. Once you’ve got this sorted, run the build.sh script. I typically tee the output to a file for debugging … this is some very ‘pre-alpha’ stuff so you should expect to debug the build a bit 🙂

./build.sh | tee build.log

This will build a few images from the measured-image-bootimg recipe (tarballs, cpios, and an iso). The local.conf I’ve got in my build directory is specific to my test hardware so if you’ve got an Intel SugarBay system to test on then you can dump the ISO directly to a USB stick and boot it. If you don’t have a SugarBay system then you’ll have to do some work to get it booting since this measured boot stuff is closely tied to the hardware, though the ACMs I’ve packaged work for 2nd and 3rd gen i5 and i7 hardware (Sandy and Ivy Bridge).

recipes

I’ve organized the recipes that make up this work into two categories: Those that are specific to the TPM and those that are specific to TXT / tboot. Each of these two technologies requires some kernel configs so those are separated out into fragments like I’ve found in other layers. My test hardware has USB 3.0 ports which the base OE layers don’t seem to have yet. I’ve included this config in my oe-measured distro just so I can use the ports on the front of my test system.

The TPM recipes automate building the Trousers daemon, libtspi and some user space tools that consume the TSS interface. Recipes for the TPM software are pretty straight forward as most are autotools projects. Some work was required to get the trousers project separated into packages for the daemon and library.

The tboot recipes were a bit more work because tboot packages a bunch of utilites in the main tboot source tree so they had to be separated out into different packages (this work is still on-going). Further tboot doesn’t use autotools and they squash most compiler flags that the OE environment passes in. The compler flags required by tboot are static which stands at odds with OE and a cross-compiled environment that wants to change the path to everything including the compiler.

I’ve no clue if tboot will build properly on anything other than an Intel system. Further the issue of Intel hiding the ACMs required for their chipssets behind an EULA wall is annoying as the default OE fetcher won’t work.

images

My first instinct is always to to describe a system by construction: from the bottom up. In this case I think going top-down is a better approach so we’ll start with the rootfs and work backwards. The TPM recipes includes two images based on the core-image from OE core. That’s one initramfs image and one rootfs. The rootfs is just the core-image with the TPM kernel drivers, trousers daemon, tpm-tools and the tpm-quote-tools. I haven’t done much with this rootfs other than booting it up and see if TXT and the TPM works as expected.

There’s also an initramfs with the TPM kernel drivers, trousers daemon and the tpm-tools but not the quote tools. This is a very minimal initramfs with the TSS daemon loaded manually in the initrd script. It’s not expected that users will be using the tpm-tools interactively here but that’s what I’ve been doing for initial testing. Only the tpm_extendpcr tool (open source from Citrix) is used to extend a PCR with the sha1sum hash of the rootfs before the call to switch_root. This requires that the ‘coreutils’ package be included just for the one utility which bloats the initramfs unfortunately. Slimming this down should’t be too much work in the future. Anyway I think this is ‘the right way’ to extend the measurement chain from the initramfs up to the rootfs of the system.

The rest of the measruements we care about are taken care of by the components from the TXT recipes. There’s only one image in the TXT recipe group however. This is derived from the OE core live image and it’s intended to be ‘deployable’ in the lanugage of OE recipes. I think this means an hddimg or an ISO image, basically something you can ‘dd’ to disk and boot. Currently it’s the basis for a live image but could easily be used for something like an installer simply by switching out the rootfs.

This image is not a separate root filesystem but instead it’s an image created with the files necessary to boot the system: syslinux (configured with the mboot.c32 comboot module), tboot, acms and the initrd and the rootfs from the TPM recipes. tboot measures the bootloader config, all of the boot modules and a bunch of other stuff (see the README in the tboot sources for details). It stores these measurements in the TPM for us, creating the ‘dynamic root of trust for measurement’ (DRTM).

Once tboot has measured all of the modules, the initramfs takes over. The initramfs then measures the rootfs as described above before the switch to root. I’ve added a few kernel parameters to pass the name of the rootfs and the PCR where it’s measurement is to be stored.

If the rootfs is measured on each boot it must be mounted read-only to prevent its measurement from changing … yup even mounting a journaled file system read-write will modify the journal and change the filesystem. Creating a read-only image is a bit of work so for this first prototype I’ve used a bit of a short cut: I’ve mounted the rootfs read only, create a ramfs read write, then the two are combined in a unionfs. In this configuration when rootfs boots it looks like a read / write mount. Thus on each boot the measurements in the TPM are the same.

Next Steps

Measuring a system is all well and good but who cares? Measurements are only useful when they’re communicated to external parties. For now this image only takes measurements and these measurements are the same on each boot. That’s it. Where this can be most immediately useful is that these measurements can be predicted in the build.

The PCRs 0-7 are reserved for the BIOs and we have no way of predicting these values currently as they’re unique to the platform and that’s messy. The tboot PCRs however (17, 18 and 19 in the Legacy mapping currently used) can be calculated based on the hashing done by tboot (read their docs and http://www.mail-archive.com/tboot-devel@lists.sourceforge.net/msg00069.html). The PCR value containing the measurement of the rootfs can be calculated quite simply as well.

For a reference live image this is interesting only in an academic capacity. As I suggest above, this image can be used as a template for something like an installer which would give the predictability of PCR values much deeper meaning: Consider an installer architecture where the installer itself is a very small rootfs that downloads the install package from a remote server (basically Debian’s netboot iso or a PXE boot setup). Assuming we have a method for exchanging system measurements (more future work) it would be very useful for the remote server to be able to evaluate measurements from the installer before releasing the install package.

This is probably a good place to wrap up this post. The meta-measured layer I’ve described is still very new and the images I’ve built are still usefuly only for ‘tire-kicking’. My next post will hopefully discuss predicting measurement values in the build system and other fun stuffs.

openembedded yocto native hello world

NOTE: I took the time to get to the bottom of the issue discussed in this post. There’s a new post here that explains the “right way” to use Makefiles with yocto. As always, the error in this post was mine 🙂

I’ve officially “drank the Kool-Aid” and I’m convinced openembedde and Yocto are pretty awesome. I’ve had a blast building small Debian systems on PCEngines hardware in the past and while I’m waiting for my Raspberry Pi to arrive I’ve been trying to learn the ins and outs of Yocto. The added bonus is that the XenClient team at Citrix uses openembedded for our build system so this work can also fall under the heading of “professional development”.

Naturally the first task I took on was way too complicated so I made a bunch of great progress (more about that in a future post once I get it stable) but then I hit a wall that I ended up banging my head against for a full day. I posted a cry for help on the mailing list and didn’t get any responses so I set out to remove as many moving parts as possible and find the root cause.

First things first read the Yocto development manual and the Yocto reference for whatever release you’re using. This is essential because no one will help you till you’ve read and understand these 🙂

So the software I’m trying to build is built using raw Makefiles, none of that fancy autotools stuff. This can be a bit of a pain because depending on the Makefiles, it’s not uncommon for assumptions to be made about file system paths. Openembedded is all about cross compiling so it wants to build and install software under all sorts of strange roots and some Makefiles just can’t handle this. I ran into a few of these scenarios but nothing I couldn’t overcome.

Getting a package for my target architecture wasn’t bad but I did run into a nasty problem when I tried to get a native package built. From the searches I did on the interwebs it looks like there have been a number of ways to build native packages. The current “right way” is simply to have your recipe extend the native class. Thanks to XorA for documenting his/her new package workflow for that nugget.

BBCLASSEXTEND = "native"

After having this method blow up for my recipe I was tempted to hack together some crazy work around. I really want to upstream the stuff I’m working on though and I figure having crazy shit in my recipe to work around my misunderstanding of the native class was setting the whole thing up for failure. So instead I went back to basics and made a “hello world” program and recipe (included at the end of this post) hoping to recreate the error and hopefully figure out what I was doing wrong at the same time.

It took a bit of extra work but I was able to recreate the issue with a very simple Makefile. First the error message:

NOTE: package hello-native-1.0-r0: task do_populate_sysroot: Started
ERROR: Error executing a python function in /home/build/poky-edison-6.0/meta-test/recipes-test/helloworld/hello_1.0.bb:
CalledProcessError: Command 'tar -cf - -C /home/build/poky-edison-6.0/build/tmp/work/i686-linux/hello-native-1.0-r0/sysroot-destdir///home/build/poky-edison-6.0/build/tmp/sysroots/i
686-linux -ps . | tar -xf - -C /home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux' returned non-zero exit status 2 with output tar: /home/build/poky-edison-6.0/build/tmp/work
/i686-linux/hello-native-1.0-r0/sysroot-destdir///home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux: Cannot chdir: No such file or directory
tar: Error is not recoverable: exiting now
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors


ERROR: The stack trace of python calls that resulted in this exception/failure was:
ERROR:   File "sstate_task_postfunc", line 10, in 
ERROR:
ERROR:   File "sstate_task_postfunc", line 4, in sstate_task_postfunc
ERROR:
ERROR:   File "sstate.bbclass", line 19, in sstate_install
ERROR:
ERROR:   File "/home/build/poky-edison-6.0/meta/lib/oe/path.py", line 59, in copytree
ERROR:     check_output(cmd, shell=True, stderr=subprocess.STDOUT)
ERROR:
ERROR:   File "/home/build/poky-edison-6.0/meta/lib/oe/path.py", line 121, in check_output
ERROR:     raise CalledProcessError(retcode, cmd, output=output)
ERROR:
ERROR: The code that was being executed was:
ERROR:      0006:        bb.build.exec_func(intercept, d)
ERROR:      0007:    sstate_package(shared_state, d)
ERROR:      0008:
ERROR:      0009:
ERROR:  *** 0010:sstate_task_postfunc(d)
ERROR:      0011:
ERROR: (file: 'sstate_task_postfunc', lineno: 10, function: )
ERROR:      0001:
ERROR:      0002:def sstate_task_postfunc(d):
ERROR:      0003:    shared_state = sstate_state_fromvars(d)
ERROR:  *** 0004:    sstate_install(shared_state, d)
ERROR:      0005:    for intercept in shared_state['interceptfuncs']:
ERROR:      0006:        bb.build.exec_func(intercept, d)
ERROR:      0007:    sstate_package(shared_state, d)
ERROR:      0008:
ERROR: (file: 'sstate_task_postfunc', lineno: 4, function: sstate_task_postfunc)
ERROR: Function 'sstate_task_postfunc' failed
ERROR: Logfile of failure stored in: /home/build/poky-edison-6.0/build/tmp/work/i686-linux/hello-native-1.0-r0/temp/log.do_populate_sysroot.30718
Log data follows:
| NOTE: QA checking staging
| ERROR: Error executing a python function in /home/build/poky-edison-6.0/meta-test/recipes-test/helloworld/hello_1.0.bb:
| CalledProcessError: Command 'tar -cf - -C /home/build/poky-edison-6.0/build/tmp/work/i686-linux/hello-native-1.0-r0/sysroot-destdir///home/build/poky-edison-6.0/build/tmp/sysroots
/i686-linux -ps . | tar -xf - -C /home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux' returned non-zero exit status 2 with output tar: /home/build/poky-edison-6.0/build/tmp/wo
rk/i686-linux/hello-native-1.0-r0/sysroot-destdir///home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux: Cannot chdir: No such file or directory
| tar: Error is not recoverable: exiting now
| tar: This does not look like a tar archive
| tar: Exiting with failure status due to previous errors
|
|
| ERROR: The stack trace of python calls that resulted in this exception/failure was:
| ERROR:   File "sstate_task_postfunc", line 10, in 
| ERROR:
| ERROR:   File "sstate_task_postfunc", line 4, in sstate_task_postfunc
| ERROR:
| ERROR:   File "sstate.bbclass", line 19, in sstate_install
| ERROR:
| ERROR:   File "/home/build/poky-edison-6.0/meta/lib/oe/path.py", line 59, in copytree
| ERROR:     check_output(cmd, shell=True, stderr=subprocess.STDOUT)
| ERROR:
| ERROR:   File "/home/build/poky-edison-6.0/meta/lib/oe/path.py", line 121, in check_output
| ERROR:     raise CalledProcessError(retcode, cmd, output=output)
| ERROR:
| ERROR: The code that was being executed was:
| ERROR:      0006:        bb.build.exec_func(intercept, d)
| ERROR:      0007:    sstate_package(shared_state, d)
| ERROR:      0008:
| ERROR:      0009:
| ERROR:  *** 0010:sstate_task_postfunc(d)
| ERROR:      0011:
| ERROR: (file: 'sstate_task_postfunc', lineno: 10, function: )
| ERROR:      0001:
| ERROR:      0002:def sstate_task_postfunc(d):
| ERROR:      0003:    shared_state = sstate_state_fromvars(d)
| ERROR:  *** 0004:    sstate_install(shared_state, d)
| ERROR:      0005:    for intercept in shared_state['interceptfuncs']:
| ERROR:      0006:        bb.build.exec_func(intercept, d)
| ERROR:      0007:    sstate_package(shared_state, d)
| ERROR:      0008:
| ERROR: (file: 'sstate_task_postfunc', lineno: 4, function: sstate_task_postfunc)
| ERROR: Function 'sstate_task_postfunc' failed
NOTE: package hello-native-1.0-r0: task do_populate_sysroot: Failed
ERROR: Task 3 (virtual:native:/home/build/poky-edison-6.0/meta-test/recipes-test/helloworld/hello_1.0.bb, do_populate_sysroot) failed with exit code '1'
ERROR: 'virtual:native:/home/build/poky-edison-6.0/meta-test/recipes-test/helloworld/hello_1.0.bb' failed

So even with the most simple Makefile I could cause a native recipe build to blow up. Here’s the Makefile:

.PHONY : all clean install uninstall

PREFIX ?= $(DESTDIR)/usr
BINDIR ?= $(PREFIX)/bin

HELLO_src = hello.c
HELLO_bin = hello
HELLO_tgt = $(BINDIR)/$(HELLO_bin)

all : $(HELLO_bin)

$(HELLO_bin) : $(HELLO_src)

$(HELLO_tgt) : $(HELLO_bin)
	install -d $(BINDIR)
	install -m 0755 $^ $@

clean :
	rm $(HELLO_bin)

install : $(HELLO_tgt)

uninstall :
	rm $(BINDIR)/$(HELLO_tgt)

And here’s the relevant install method from the bitbake recipe:

do_install () {
    oe_runmake DESTDIR=${D} install
}

Notice I’m using the variable DESTDIR to tell the Makefile the root (not just /) to install things to. This should work right? It works for a regular package but not for a native one! This drove me nuts for a full day.

The solution to this problem lies in some weirdness in the Yocto native class when combined with the populate_sysroot method. The way I figured this out was by inspecting the differences in the environment when building hello vs hello-native. When building the regular package for the target architecture variables like bindir and sbindir were what I would expect them to be:

bindir="/usr/bin"
sbindir="/usr/sbin"

but when building hello-native they get a bit crazy:

bindir="/home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux/usr/bin"
sbindir="/home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux/usr/sbin"

This is a hint at the source of crazy path that staging is trying to tar up above in the error message. Further if you look in the build directory for a regular target arch package you’ll see your files where you expect in ${D}sysroot-destdir/usr/bin but for a native build you’ll see stuff in ${D}sysroot-destdir/home/build/poky-edison-6.0/build/tmp/sysroots/i686-linux/usr/bin. Pretty crazy right? I’m sure there’s a technical reason for this but it’s beyond me.

So the way you can work around this is by telling your Makefiles about paths like bindir through the recipe. A fixed do_install would look like this:

do_install () {
    oe_runmake DESTDIR=${D} BINDIR=${D}${bindir} install
}

For more complicated Makefiles you can probably specify a PREFIX and set this equal to the ${prefix} variable but YMMV. I’ll be trying this out to keep my recipes as simple as possible.

If you want to download my example the recipe is here. This will pull down the hello world source code and build the whole thing for you.

What does acpi_fakekeyd do?

In setting up SELinux on my Laptop running Squeeze I’m taking a pretty standard approach. First off I’m working off the packages provided in Sid maintained by Russell Coker so most of the hard work has been done. There are a few programs, mostly specific to a laptop that still aren’t in the right domains. We can see this by dumping out the running programs and their domains:

ps auxZ

Determining the “right domain” for a process is a bit harder but there’s a pretty obvious place to start. No daemons should be running in initrc_t!

initrc_t is the domain given to scripts run by the init daemon. That’s pretty much any script in /etc/init.d. If a daemon is running in this domain after startup it likely means that there was no transition rule in place to put it into a domain specific to the daemon. I figured I’d take these on alphabetically and started with acpi_fakekeyd 🙂

A policy for acpi_fakekeyd

All of the power management stuff like acpid runs in the apmd_t so the first thing I tried was running acpi_fakekeyd in this domain. You can go through the trouble of adding the path /usr/sbin/acpi_fakekeydto the apmd_t policy module, rebuilding it and reloading it (which really isn’t that hard these days) or you can take a shortcut like so:

echo "system_u:system_r:apmd_exec_t:s0" | sudo attr -S -s selinux /usr/sbin/acpi_setkeyd

This sets the label on the executable such that when init runs the start up script, the daemon will end up in the apmd_t domain.

Once the label is set you can restart the daemon using run_init, assuming your user is in a domain that can run init scripts (unconfined, admin etc). If all goes well the daemon will end up running in the right domain. I then did what I thought was exercising the domain to see if it would cause any AVCs. This required sending the daemon a few characters using the acpi_fakekey command directly as well as putting my laptop to sleep and into hibernation (see the /etc/acpi/sleep.sh script). There weren’t any AVCs so I concluded the apmd_t domain had all of the permissios that the fakekey daemon needed. I was wrong but we’ll get to that.

acpi_fakekeyd in it’s own domain

I was really expecting a few denial messages so I decided to put acpi_fakekeyd into its own domain with no privileges. The idea was to see some AVCs and to get a feeling for what exactly the daemon does.

The policy module I whipped up is super simple:
acpi_fakekeyd.te

policy_module(acpi_fakekeyd, 0.1)

########################################
#
# Declarations
#
type acpi_fakekeyd_t;
type acpi_fakekeyd_exec_t;
init_daemon_domain(acpi_fakekeyd_t, acpi_fakekeyd_exec_t)

acpi_fakekey.fc

/usr/sbin/acpi_fakekeyd --      gen_context(system_u:object_r:acpi_fakekeyd_exec_t,s0)

No interfaces yet so the acpi_fakekeyd.if file was empty.

After restarting the daemon, checking it’s in the right domain and exercising my ACPI system … there still weren’t any AVCs! Obviously I’m missing something so a bit of research turned up this bug report which explains pretty much everything.

acpi_fakekeyd deprecated

To save you a bunch of reading it turns out that toward the end of the discussion thread (about 8 months after the initial post) it’s identified that the functionality of acpi_fakekeyd is deprecated in kernels after 2.6.24. It seems that the functionality should instead be provided by an in-kernel driver which my laptop (ThinkPad x61s) has.

So why is this daemon installed and running? If I disable it my laptop ACPI still works fine. But the acpi_support package which is required to put my laptop to sleep depends on the acpi_fakekey package. This is likely because the scripts provided by acpi_support call the acpi_fakekey application for backwards comparability on some systems. This doesn’t make much sense to me though since Squeeze ships with a 2.6.32 kernel.

The answer to the question I pose as the title of this post is: It doesn’t do anything on my system. I don’t even need to have it running so I just shut it off. Problem solved I guess, and from a security perspective this is an even better solution that running it in it’s own SELinux domain. If it’s not running, it can’t do any damage. I’d rather be able to remove the package completely though.

Does anyone out there have a laptop that requires this daemon? I’m tempted to file a bug against the package … Anyway on to the next daemon 🙂

thinkpad keys on awesome WM

The last bit of configuration that my laptop required before I’d call it “usable” with the new window manager was related to the function keys. This doesn’t have anything to do with configuring the awesome window manager directly but it does address a few small configurations necessary to provide some functionality that I was used to having gnome do by default.

Linux has great support for Thinkpads in general and my x61s specifically. With basic power management packages installed and the kernel thinkpad_acpi driver loaded not only did S3 work fine but my Thinkpad function key to put my laptop to sleep worked out of the box! The only keys that didn’t work right out of the box were an easy fix.

Screen Brightness

Traveling as much as I do I find that I use the manual controls for screen brightness quite a bit. If you’re on a plane often times it’s really nice to be able to dial down the screen brightness to absolutely minimal levels to preserve your battery and your eyes. The gnome-power-manager is the gnome component that takes care of this and it’s simple enough to install. Nothing interesting here except a new found appreciation for how much Gnome does out of the box and why it’s so big.

Volume and Mute

Lastly configuring the volume buttons was a must. If you’re just interested in getting them to work, install the gnome-settings-daemon and add a command to your rc.lua file to run it. I spent a little time getting to the bottom of it and learned a little bit.

Fire up a terminal and run the xev command to see the keyboard events from each key. A regular keyboard key will generate some output like:

KeyPress event, serial 29, synthetic NO, window 0x1600001,
    root 0x105, subw 0x0, time 316658902, (470,254), root:(472,689),
    state 0x0, keycode 24 (keysym 0x71, q), same_screen YES,
    XLookupString gives 1 bytes: (71) "q"
    XmbLookupString gives 1 bytes: (71) "q"
    XFilterEvent returns: False

KeyRelease event, serial 29, synthetic NO, window 0x1600001,
    root 0x105, subw 0x0, time 316658990, (470,254), root:(472,689),
    state 0x0, keycode 24 (keysym 0x71, q), same_screen YES,
    XLookupString gives 1 bytes: (71) "q"
    XFilterEvent returns: False

Those are they key press and release events triggered by typing a ‘q’. Pressing one of the volume buttons looks a little different:

KeyPress event, serial 28, synthetic NO, window 0x1000001,
    root 0x105, subw 0x0, time 316864564, (132,42), root:(134,477),
    state 0x0, keycode 121 (keysym 0x1008ff12, XF86AudioMute), same_screen YES,
    XLookupString gives 0 bytes: 
    XmbLookupString gives 0 bytes: 
    XFilterEvent returns: False

KeyRelease event, serial 28, synthetic NO, window 0x1000001,
    root 0x105, subw 0x0, time 316864684, (132,42), root:(134,477),
    state 0x0, keycode 121 (keysym 0x1008ff12, XF86AudioMute), same_screen YES,
    XLookupString gives 0 bytes: 
    XFilterEvent returns: False

You’ll notice that they keycode indicates the XF86AudioMute button was pressed this time instead of q but the XLookupString returns 0 bytes.

Basically the three volume buttons are is associated with virtual X keys: XF86AudioMute, XF86AudioRaiseVolume, XF86AudioLowerVolume. It’s pretty straight forward to use xbindkeys to map these to alsa commands to manually mute, raise or lower the volue. My .xbindkeys looked like this:

"amixer  set Master 2dB+"
  XF86AudioRaiseVolume

"amixer set Master 2dB-"
  XF86AudioLowerVolume

"amixer set Master toggle"
  XF86AudioMute

Some prefer to link these keys to the PCM volume control. I found that toggle (as well as mute/unmute) doesn’t work for the PCM channel. I’m honestly not sure what the PCM channel is even for so I reserve the right to change this in the future. There’s lots of howto’s out there with people implementing shell hacks to fake a mute / unmute for the PCM channel using xbindkeys if you’re interested enough to search.

So the above configuration and a quick entry in my rc.lua to kick off xbindkeys was enough to get this working. The one down side to using xbind keys: it doesn’t have a cool little volume indicator that pops up to show the volume level when you press your volume keys 🙂

giving up on sound daemons

After whipping up a quick script to kill and restart jackd when my laptop goes into S3 (/etc/pm/sleep.d) I had a revelation: any processes connected to jackd would have to reconnect. This doesn’t happen and sound is just lost. So for jackd to be usable on a laptop they need to support S3 directly and currently in the Debian unstable repositories Jackd doesn’t.

I gave esound a try in place of jack but with totem esd ran the cpu up to 20% for a flash movie which is pretty nuts. It had a noticeable impact on the video playback so esound got uninstalled as quickly as it was installed.

Finally I just fell back to using alsa directly. This actually worked perfectly and I’m not sure why I didn’t just go this route in the first place.