# Talk at Xen Developer Summit 2013

UPDATE: Here’s the link: http://www.youtube.com/watch?v=6Q8mlTBn-ZI. I still haven’t been able to bring myself to actually watch it but I’m sure it’s great :

Just got back from the 2013 Xen Developer Summit where I gave a talk on a few interesting (to me at least) things. If you’re interested you can find my abstract here. My focus was naturally on SELinux / XSM stuff. Mostly my talk focused on the sVirt implementation in XenClient XT and another fun application of the architecture to our management stuff.

Had good chat with a guy from Amazon afterward about all of the other evil stuff someone could do if they compromised QEMU. So while sVirt prevents the specific scenario presented I’ve no doubt there are other hazards. He was specifically concerned over the Xen privcmd driver & the hypercalls it could make. Hard to disagree as QEMU with root permissions in dom0 can execute any hypercall it wants. The only way to address this (other than stubdoms) is to deprivilege QEMU to prevent it from making hypercalls. That would probably require some code-changes in QEMU so it’s no small task.

I also touched briefly on the design for an inter-VM communication (IVC) mechanism that was floated to xen-devel this summer. In XT we have an IVC called ‘V4V’ that isn’t acceptable to upstream. When it came to our XSM policy however V4V had some favorable properties in that we created a new object in the hypervisor that was a ‘first-class’ object in the policy.

The proposal uses the same model as the front/back drivers so there would be no new object specific to the IVC. This means there wouldn’t be way to differentiate the IVC from any other front/back driver. The purpose of the talk was to point this out and hopefully solicit some discussion. Got an even better conversation going on this point so hopefully I’ll have some fun stuff to report on this front soon.

# sVirt in XenClient

It’s been 5 months since my last post about my on-going project required by my masters program at SU. With the hope of eventually getting my degree, this is my last post on the subject. In my previous post on this topic I described a quick prototype I coded up to test an example program and SELinux policy to demonstrate the sVirt architecture. This was a simple example of how categories from the MCS policy can be used to separate multiple instances of the same program. The logical step after implementing a prototype is coding up the real thing so in this post I’ll go into some detail describing an implementation of the sVirt architecture I coded for the XenClient XT platform. While it may have taken me far too long to write up a description of this project, it’s already running in a commercial product … so I’ve got that going for me 🙂

## Background

XenClient is a bit different than the upstream Xen in that the management stack has been completely rewritten. Instead of the xend process which was written in python, XenClient uses a toolstack that’s rewritten in Haskell. This posed two significant hurdles. First I’ve done little more than read the first few pages from a text book on Haskell so the sVirt code, though not complex, would be a bit over my skill level. Second SELinux has no Haskell bindings which would be required by the sVirt code.

Taking on the task of learning a new functional programming language and writing bindings for a relatively complex API in this language would have taken far longer than reasonable. Though we do intend to integrate sVirt into the toolstack proper, putting this work on the so called “critical path” would have been prohibitively expensive. Instead we implemented the sVirt code as a C program that is interposed between the toolstack and the QEMU instances it is intended to separate. Thus the toolstack (the xenmgr process) invokes the svirt-interpose program each time a QEMU process needs to be started for a VM. The svirt-interpose process then does all of the necessary functions to prepare the environment for the separation of the QEMU instance requested from the others currently running.

The remainder of this document describes the svirt-interpose program in detail. We begin by describing the interfaces down the call chain between the xenmgr, svirt-interpose and QEMU.
We then go into detail describing the internal workings of the svirt-interpose code. This includes the algorithm used to assign categories to QEMU processes and to label the system objects used by these processes. We conclude with a brief analysis of the remaining pieces of the system that could benefit from similar separation. In my first post on this topic I describe possible attacks we’re defending against so I’ll not repeat that here.

## Call Chain

As we’re unable to integrate the sVirt code directly into the toolstack we must interpose the execution of the sVirt binary between the toolstack and QEMU. We do this by having the toolstack invoke the sVirt binary and then have sVirt invoke QEMU after performing the necessary SELinux operations. For simplicity we leave the command line that the toolstack would normally pass to QEMU unchanged and simply extract the small piece of information we need from it in the sVirt code. All sVirt requires to do it’s job is the domain id (domid) of the VM it’s starting a QEMU instance for. This value is the first parameter so extracting it is quite simple.

The final bit that must be implemented is in policy. Here we must be sure that the policy we write reflects this call chain explicitly. This means removing the ability for the toolstack (xend_t) to invoke QEMU (qemu_t) directly and replacing this with allowing the toolstack to execute the svirt-interpose program (svirt_t) while allowing the svirt-interpose domain to transition to the QEMU domain. This is an important part of the process as it prevents the toolstack from bypassing the svirt code. Many will find protections like this superfluous as it implies protections from a malicious toolstack and the toolstack is a central component of the systems TCB. There is a grain of truth in this argument though it represents a rather misguided analysis. It is very important to limit the permissions granted to a process to limit a possible vulnerability even if the process we’re confining is largely a “trusted” system component.

## Category Selection

The central piece of this architecture is to select a unique MCS category for each QEMU process and assign this category to the resources belonging to said process. Before a category can be assigned to a resource we must first chose the category. The only requirement we have when selecting categories is that they are unique (not used by another QEMU process).
Thus there is no special meaning in a category number. Thus it makes sense to select the category number at random.

We’re presented with an interesting challenge here based on the nature of the svirt-interpose program. If this code was integrated with the toolstack directly it would be reasonable to maintain a data structure mapping the running virtual machines to their assigned categories. We could then select a random category number for a new QEMU instance and quickly walk this in-memory structure to be sure this category number hasn’t already been assigned to another process. But as was described previously, the svirt-interpose code is a short lived utility that is invoked by the toolstack and dies shortly after it starts up a QEMU process. Thus we need persistent storage to maintain this association.

The use of the XenStore is reasonable for such data and we use the key ‘selinux-mcs’ under the /local/domain/\$domid node (where \$domid is the domain id of a running VM) to store the value. Thus we randomly select a category and then walk the XenStore tree examining this key for each running VM. If a conflict is detected a new value is selected and the search continues. This is a very naive algorithm and we discuss ways in which it can be improved in the section on future work.

## Object labeling

Once we’ve successfully interposed our svirt code between the toolstack and QEMU and implemented our category selection algorithm we have two tasks remaining. First we must enumerate the objects that belong to this QEMU instance and label them appropriately. Second we must perform the steps necessary to ensure the QEMU process will be labeled properly before we fork and exec it.

Determining the devices assigned to a VM by exploring the XenStore structures is tedious. The information we begin with is the domid of the VM we’re configuring QEMU for. From this we can examine the virtual block devices (VBDs) that belong to this VM but the structure in the VM specific configuration space rooted at /local/domain/\$domid only contains information about the backend hosting the device. To find the OS objects associated with the device we need to determine the backend, then examine the configuration space for that backend.

We begin by listing the VBDs assigned to a VM by enumerating the /local/domain/\$domid/device/vbd XenStore directory. This will yeild a number of paths in of the form /local/domain/\$domid/device/vbd/\$vbd_num where \$vbd_num is the numeric id assigned to a virtual block device. VMs can be assigned any number of VBDs so we must process all VBDs listed in this directory.

From these paths representing each VBD assigned to a VM we get closer to the backing store by extracting the path to the backend of the split xen block driver. This path is contained in the key /local/domain/\$domid/device/vbd/\$vbd_num/backend. Once this path is extracted we check to see if the device in dom0 is writable by reading the ‘mode’ value. If the mode is ‘w’ the device is writable and we must apply the proper MCS label to it. We ignore read only VBDs as XenClient only assigns CDROMs as read only, all hard disks are allocated as read/write.

Once we’ve determined the device is writable we now need to extract the dom0 object (usually a block or loopback device file) that’s backing the device. The location of the device path in XenStore depends on the backend storage type in use. XenClient uses blktap processes to expose VHDs through device nodes in /dev and loopback devices to expose files that contain raw file systems. If a loopback device is in use the path to the device node will be stored in the XenStore key ‘loop-device’ in the corresponding VBD backend directory. Similarly if a bit more cryptic, the device node for a blktap device for a VHD will be in the XenStore key ‘params’.

Once these paths have been extracted the devices can be labeled using the SELinux API. To do so, we first need to know what the label should be. Through the SELinux API we can determine the current context for the file. We then set the MCS category calculated for the VM on this context and then change the file context to the resultant label. Important to note here is that both a sensitivity level and a category must be set on the security context. The SELinux API doesn’t shield us from the internals of the policy here and even though the MCS policy doesn’t reason about sensitivities there is a single sensitivity defined that must be assigned to every object (s0).

Assigning a category to the QEMU process is a bit different. Unlike file system objects there isn’t an objct that we can query for a label. Instead we can ask the security server to calculate the resultant label of a transition from the current process (sVirt) to the destination process (QEMU). There is an alteernative method available however and this allows us to deterine the type for the QEMU process directly. SELinux has added some native support for virtualization and one such bit was the addition of the API call ‘selinux_virtual_domain_context_path’. This function returns the path of a file in the SELinux configuration directory that contains the type to be assigned to domains used for virtualization.

Once we have this type the category calculated earlier is then applied and the full context is obtained. SELinux has a specific API call that allows the caller to request the security server apply a specific context to the process produced by the next exec performed by the calling process (setexeccon). Once this has been done successfully the sVirt process cleans up the environment (closes file descriptors etc) and execs the QEMU program passing it the unmodified command line that was provided by the toolstack.

## Conclusion

Applying an MCS category to a QEMU process and its resources is fairly straight forward task. There are a few details that must be attended to to ensure that proper error handling is in place but the code is relatively short (~600 LOC) and quite easy to audit. There are some places where the QEMU processes must overlap however. XenClient is all about multiplexing shared hardware between multiple virtual machines on the same PC / Laptop. Sharing devices like the CD-ROM that is proxied to clients via QEMU requires some compromise.

As we state above the CD-ROM is read-only so an MCS category is not applied to the device itself but XenClient must ensure the accesses to the device are exclusive. This is achieved by QEMU obtaining an exclusive lock on a file in /var/lock before claiming the CD-ROM. All QEMU processes must be able to take this lock so the file must be created without any categories. This may seem like a minor detail but it’s quite tedious to implement in practice and it does represent path for data to be transmitted from one QEMU process to another. Transmission through this lock file would require collusion between QEMU processes so it’s considered a minimal threat.

## Future Work

This is my last post in this series that has nearly spanned a year. I’m a bit ashamed it’s taken me this long to write up my masters project but it did end up taking on a life of its own getting me a job with Citrix on the XenClient team. There’s still a lot of work to be done and I’m hoping to continue documenting it here. Firstly I have to collect the 8 blog posts that document this work and roll them up into a single document I can submit to my adviser to satisfy my degree requirements.

In parallel I’ll be working all things XenClient hopefully learning Haskell and integrating the sVirt architecture directly into our toolstack. Having this code in the toolstack directly will have a number of benefits. Most obviously it’ll remove a few forks so VM loading will be quicker. More interestingly though it will open up the possibility of applying MCS category labeling to devices (both PCI and USB) that are assigned to VMs. The end goal, as always, is strengthening the separation between the system components that need to remain separate thus improving the security of the system.

# sVirt Simulation Demo

In my last post on this topic I gave a quick description of a simulation of the sVirt architecture. Talking about it is only half the work. In this post I’ll show it in action and interpret the output as it relates to the separation goals.

## Building and Installing

After you clone the git repo from my last post you’ll have all of the source code and policy for the simulation. Typing `make` in the root of the tree should build both the policy and the simulation code. Install it with `make install`.

There are differences between different Linux distros and how they do their SELinux stuff. This was developed and tested on Debian Squeeze.

## Interface

The simulation is a bit sparse so the interface is just a textual one. Run it on the console by running it as root:

```svirt-sim
```

The simulation is run as root because we’re simulating a virtualization management stack like `libvirt` or `xend` and makes and manages mock VHDs in `/var/lib/svirt-sim/images`.

The initial output you’ll see just a prompt of two hashes `##` and a message indicating you can get a list of commands by pressing `?`. With these commands you can interactively perform actions typical for a virtualization stack: create, delete, start, stop and show information about virtual machines:

```press ? for a list of commands
##:?
c : create a VM
d : delete a VM
h : start a VM
k : stop a VM
s : show objects and labels
q : quit
```

## Creating VMs

First things first, we need to create two VMs. When we create a VM the `svirt-sim` process creates a VHD for it and allocates an MCS category for the mock QEMU instance (that we call `not-qemu`) that will be started when the VM is started. The first one we’ll create will do what we expect it to do: access it’s VHD file simulating disk access. We’ll call this VM ‘good-one’ since it’s well behaved. The second VM we’ll create will be the ‘bad-one’ and it will attempt to violate confinement and access VHDs belonging to the other VM. Here’s the output we’ll see:

```##:c
what's the name of the VM?: good-one
what's the vm's disposition [good|bad]?: good
creating: /var/lib/svirt-sim/images/good-one.vhd
##:c
what's the name of the VM?: bad-one
```

Now that we’ve created our VMs we can examine the internal state of the management stack by pressing ‘s’:

```##:s
======================
VMs:
VM qemu PID   0
VM Status:    Stopped
mcs category:     s0:c850
prev vhd context: (null)
VM Name:        good-one
VHD Name:     /var/lib/svirt-sim/images/good-one.vhd
Disposition:  good
VM qemu PID   0
VM Status:    Stopped
mcs category:     s0:c440
prev vhd context: (null)
======================
=============================
dumping security context list
mcs: s0:c850
mcs: s0:c440
=============================
```

We can also see the mock VHDs created by the management stack and examine their security context:

```-rw-r-----. 1 root root unconfined_u:object_r:svirt_sim_image_t:s0:c0 0 Sep  3 23:27 bad-one.vhd
-rw-r-----. 1 root root unconfined_u:object_r:svirt_sim_image_t:s0:c0 0 Sep  3 23:27 good-one.vhd
```

We use the first category (c0) as the context for VMs that aren’t running. No VM should ever be run with this category so the VHD of a running VM should never be accessible to a `not-qemu` instance.

## Running VMs

The VMs created above can be run by pressing ‘h’. Once they’re running press ‘s’ again to see the change in the internal state of the management stack. We’ll now see pid of the mock QEMU process started.

```##:h
what's the name of the VM?: good-one
starting vm: good-one
security_start for vm with mcs: s0:c440
started child with pid 1298
##:h
what's the name of the VM?: bad-one
security_start for vm with mcs: s0:c850
started child with pid 1300
##:s
======================
VMs:
VM qemu PID   1300
VM Status:    Running
mcs category:     s0:c850
prev vhd context: unconfined_u:object_r:svirt_sim_image_t:s0:c0
VM Name:        good-one
VHD Name:     /var/lib/svirt-sim/images/good-one.vhd
Disposition:  good
VM qemu PID   1298
VM Status:    Running
mcs category:     s0:c440
prev vhd context: unconfined_u:object_r:svirt_sim_image_t:s0:c0
======================
=============================
dumping security context list
mcs: s0:c850
mcs: s0:c440
=============================
```

Now that the simulated “VMs” are running we can view their contexts and those of their VHDs to be sure they match up to the internal state of the svirt-sim stack:

```# ls -l /var/lib/svirt-sim/images
-rw-r-----. 1 root root system_u:object_r:mcs_server_image_t:s0:c850 0 Sep  3 23:27 bad-one.vhd
-rw-r-----. 1 root root system_u:object_r:mcs_server_image_t:s0:c440 1 Sep  3 23:28 good-one.vhd
```
```ps auxZ | grep not-qemu
system_u:system_r:not_qemu_t:s0:c440 root 1298  0.0  0.1   1664   544 ?        Ss   23:28   0:00 /sbin/not-qemu --good --file=/var/lib/svirt-sim/images/good-one.vhd
system_u:system_r:not_qemu_t:s0:c850 root 1300  0.0  0.1   1696   544 ?        Ss   23:28   0:00 /sbin/not-qemu --bad --file=/var/lib/svirt-sim/images/bad-one.vhd
```

So far the output looks as we’d expect: the category assigned to the running `not-qemu` process matches the category of it’s VHD file.

## Testing Confinement

Built in to this simulation is a simple confinement test. The `not-qemu` process has an option I call it’s disposition. If started with the option `--good` it will only attempt to access the VHD file it’s been assigned. If started with the `--bad` option then it will cycle through the directory containing all of the VHDs and it will attempt to access all of them. This is meant to simulate a malicious QEMU process reading the disk of another VM. Above we’ve started two VMs, one good and one bad. We can see the evidence of their behavior in the system logs.

The syslog on my system shows the following trace of accesses made by each of the `not-qemu` processes. I’ve removed the time stamps but the messages should be considered sequential.

```/sbin/not-qemu[1298]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1298]: not-qemu with pid: 1298 still alive
/sbin/not-qemu[1300]: new file path: /var/lib/svirt-sim/images/.
/sbin/not-qemu[1300]: opening VHD file: /var/lib/svirt-sim/images/.
/sbin/not-qemu[1300]: open failed: Is a directory
/sbin/not-qemu[1298]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1298]: not-qemu with pid: 1298 still alive
/sbin/not-qemu[1300]: new file path: /var/lib/svirt-sim/images/..
/sbin/not-qemu[1300]: opening VHD file: /var/lib/svirt-sim/images/..
/sbin/not-qemu[1300]: open failed: Is a directory
/sbin/not-qemu[1298]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1298]: not-qemu with pid: 1298 still alive
/sbin/not-qemu[1300]: not-qemu with pid: 1300 still alive
/sbin/not-qemu[1298]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1298]: not-qemu with pid: 1298 still alive
/sbin/not-qemu[1300]: new file path: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1300]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1300]: open failed: Permission denied
kernel: [ 2678.904913] type=1400 audit(1315107146.430:22): avc:  denied  { read write } for  pid=1300 comm="not-qemu" name="go
od-one.vhd" dev=sda3 ino=1444187 scontext=system_u:system_r:not_qemu_t:s0:c850 tcontext=system_u:object_r:mcs_server_image_t:s
0:c440 tclass=file
/sbin/not-qemu[1298]: opening VHD file: /var/lib/svirt-sim/images/good-one.vhd
/sbin/not-qemu[1298]: not-qemu with pid: 1298 still alive
/sbin/not-qemu[1300]: new file path: /var/lib/svirt-sim/images/.
/sbin/not-qemu[1300]: opening VHD file: /var/lib/svirt-sim/images/.
/sbin/not-qemu[1300]: open failed: Is a directory
/sbin/not-qemu[1298]: not-qemu with pid 1298 stopping on user request
/sbin/not-qemu[1298]: not-qemu with pid 1298 shutting down
/sbin/not-qemu[1300]: not-qemu with pid 1300 stopping on user request
/sbin/not-qemu[1300]: not-qemu with pid 1300 shutting down```

There’s a bit of noise here in that ‘bad’ `not-qemu` instances loop through the `images` directory contents and don’t differentiate between files and directories so it will attempt to open and read the special directories `.` and `..` which I’ll clean up in the future.

The important bits are that we can see the ‘bad’ `not-qemu` instance with PID 1300 attempt to access a VHD that doesn’t belong to it. This results in the operation failing and an AVC being displayed.

## Conclusion

Whenever I’m writing SELinux policy I’m always looking for ways to test it. Policy analysis that proves a policy fulfills a specific security goal is the holy grail of security and this simulation isn’t policy analysis. Instead it’s just a simple simulation giving an example of what we’ve discussed using logical representation of the `mlsconstrain` statements in the SELinux MCS policy. Specifically, the category component of a processes context must dominate that of the object it’s accessing. We’ve represented this in logic previously now this simulation shows that the security server in the Linux kernel really does enforce this constraint.

Another interesting use of this simulation would be to use the real QEMU policy from the reference policy and a specifically crafted binary to push the limits of the policy. This binary could contain any number of known ways information can be leaked across QEMU instances. If anyone out there tries this I’d love to hear about it.

# sVirt-like prototype

We’re getting close to the end of my on-going series exploring the SELinux `mlsconstrain`. Now that we’ve gone though and used some simple logic to reason through access control decisions it’s time for a simple and practical application.

## Background

In my first post under my MastersProject tag I laid out some of the justification for why separating QEMU instances with SELinux policy in a Xen dom0 is important. To reiterate: my goal is to ensure that no qemu instance can read or write to the virtual disk belonging to another qemu instance. These are the same goals as the sVirt [1]
project which is responsible for the SELinux driver in libVirt [2]. We’ll use the same MLS categories we’ve been discussing to achieve the desired separation and we’ll validate our design with the logic previously discussed.

## Requirements

A simulation like this requires a system with a functional SELinux policy (running in enforcing mode). We assume that the user is running in an unconfined domain as this is how my development system is configured.

## Simulation Design

As always implementing a prototype/simulation first before starting in on the actual implementation is always a good idea. For our purposes this simulation will serve to familiarize us with the necessary bits of the SELinux API required to achieve our goals (will ease debugging later) and to prove to us through testing that our prototype can in fact fulfill the requirements. The simulation has three main parts:

### SELinux-specific code

The central piece of this work is the code specific to managing SELinux labels on virtual disks (vhd files) and on running QEMU processes. We select a unique category for each running VM and the properties of these categories provide us with the separation we desire. There’s no primitive to perform this task explicitly so we generate unique category numbers using a naive algorithm.

Each time a QEMU instance is started a category number is generated at random and a list of allocated categories is searched to prevent duplicates. If the category is already in use, another number will be generated and the list searched again. This continues until an unused category is found. NOTE: This is the least efficient algorithm possible but for a simulation we only need a functional algorithm, not a perfect one. This would be a prime candidate for improvement in a real implementation.

The algorithm for generating unique categories I’ve described above relies on a test for equality. Remember our previous discussion of the `mlsconstrain` on `read` and `write` operations. `Write` permissions required that the subject and object labels be equal. `Read` permissions required that the subject label dominate the object label.

With this in mind it may seem that testing for equality alone wouldn’t be sufficient. But since an equality test is more strict than a test for dominance it is sufficient to test for equality only.

Finally the SELinux specific code must ensure that the unique category we’ve just generated is applied to the necessary subjects and objects. Firstly the VHD files belonging to the QEMU instance must be labeled with this category. Selecting the proper label, adding the category and changing the label of the VHD file requires the use of libselinux and the code can be found in the prototype sources at src/sec-selinux.c. We reserve the first category (c0) for VHD files that are currently not in use and ensure that no running QEMU processes ever get this category.

The QEMU process itself must also be labeled with the unique category when it is exec’d. Determining the label for the process is much the same as for the VHD file, but we must call the `setexeccon` function before forking the QEMU instance instead of directly manipulating the label on the running process.

While the man pages for these API calls are a great documentation source I’ve found that currently on Debian they are incomplete. Specifically functions like `selinux_virtual_domain_context_path` are undocumented in the man pages. A better source as of now can be found at the SELinux Wiki API page [3].

### Mock QEMU Binary

Our prototype needs a binary to start up and test the limits of the confinement we’ve set up. We don’t intend to test the QEMU reference policy, only to test the limits of the confinement properties of the categories we’re assigning to each QEMU instance. As such we don’t want to use QEMU itself. Instead we want a simple binary that attempts to exceed its intended behavior.

As we’re aiming to limit access to the VHD files assigned to individual QEMU instances we want our mock QEMU to attempt to access the VHDs that don’t belong to it. We define two modes of operation for this program. First is what we term “good behavior” in which the process will only read and write to a file we specify on it’s command line. The second mode of operation we term “bad behavior” in which we specify a file on it’s command line again, but this time it will attempt to read and write to all files in that same directory.

To allow us to observe these fake QEMU instances at run-time I’ve added a number of syslog calls. These document the success or failure of attempted actions. Using this output we can observe the effectiveness of the protections offered by this system. This code is in the file: src/not-qemu.c

### VM Driver

The final component is a text driven interface for manipulating the simulation. This component takes the place of the standard virtualization tool stack which creates, starts, stops and destroys VMs. We expose four primary actions the user may choose:

1. Create VM: create internal representation of a VM including SELinux label and mock VHD file.
2. Start VM: launch the mock QEMU process for a VM in the assigned SELinux domian.
3. Stop VM: stop the mock QEMU process.
4. Delete VM: free up the assigned SELinux category and mock VHD file for the specified VM.

With an interface like this the user can control the simulation explicitly. Tailing the output from syslog will allow the user to observe the behavior of the simulation in real time where they can judge for themselves whether or not the confinement is effective.

## Policy

The policy for this simulation contains all of the necessary permissions for each component to function. There are also a number of permissions that are specific to the simulation. Any implementation directly in a tool stack won’t require any special permissions for SELinux users or groups as the stack will run as the system user in system role which is also the case for QEMU processes.

Our simulation is meant to be interactive and we assume that the user is unconfined. The user and role for the simulated qemu instances will be system_u and system_r respectively so some additional permissions are required by the simulation driver to allow changing the user and role label components when the qemu simulation is started and when labeling VHD files. This is simply a side effect of making the simulation interactive for the unconfined user.

The interesting parts are as follows:

```allow mcs_server_t self:process { setexec };
allow mcs_server_t mcs_server_image_type:file { relabelfrom relabelto };

ifdef(`enable_mcs',`
range_transition mcs_server_t mcs_server_image_t:file s0:c0;
')
mcs_server_manage_images(mcs_server_t)

```

You can see that we’ve defined two types, one for the driver (mcs_server_t) and one for the VHD files used by the simulated qemu instances (mcs_server_image_t). The server can manage the VHD files, it can read default contexts (required to use the libselinux API calls we need), and it can read from /dev/urandom which our category selection algorithm requires. The only nuance here is the `range_transition` which ensures that when the mcs_server_t type creates a new VHD file it is created with the proper context.

The permissions required for interaction between the mcs-server and the not-qemu instances is implemented as an interface in the mcs-server module (interface):

```interface(`mcs_server_domain',`
gen_require(`
role system_r, unconfined_r;
type mcs_server_t;
')

type \$1_t;
domain_type(\$1_t)
role system_r types \$1_t;
allow unconfined_r system_r;

type \$1_exec_t;
files_type(\$1_exec_t)

allow mcs_server_t \$1_t:process { getattr getsched setsched signal signull sigkill };
domtrans_pattern(mcs_server_t, \$1_exec_t, \$1_t)

mcs_server_rw_images(\$1_t)

# Only required in example where user directly invokes mcs-server.
#   In actual implementation user will be system_u for source and
#   destination domains so no exemption will be required.
domain_user_exemption_target(\$1_t)
role_transition unconfined_r \$1_exec_t system_r;
')```

This is a slightly strange pattern as the not-qemu module pretty much calls this one interface and that’s it. This allows the mcs-server to transition to the calling domain through an executable. It also allows the calling domain to read and write to read and write VHD files. There are two permissions at the end of the interface that are only required for our interactive simulation and they’ll be removed in a real implementation.

## Installing

This simulation and the accompanying SELinux policy can be obtained from a public git treen on this server: git://twobit.us/svirt-sim.git. Just download it, build it and give it a try. Be sure to read the requirements above and be sure you’ve set all that up because there’s a lot of moving parts behind the scenes with the build system and stuff.

## Conclusion

So that’s a brief description of a simulation implementing the sVirt design in an interactive prototype. It’s still very rough but has been extremely useful as a learning tool for myself. Next time I’ll do a walk through of the simulation and discuss some techniques we can use to be sure it’s functioning as expected.

## References

1 sVirt Requirements Document: http://selinuxproject.org/page/Svirt_requirements_v1.0
2 SELinux sVirt confinement in libVirt: http://libvirt.org/drvqemu.html#securitysvirt
3 API Summary for libselinux: http://selinuxproject.org/page/LibselinuxAPISummary

# Understanding Multi-Level Security part 4

In my last post on this topic, we got into what I would consider the second half of Multi-Level Security (MLS). Here we discussed categories, also known as compartments and how they relate to the security model itself. We then extended the short-hand logic adapted from Chu and Older to allow for reasoning through access control decisions regarding categories.

I’ll be following the format of my previous two posts under this tag, so in this post we’ll be getting our hands dirty in the SELinux MLS policy. We’ll go through these constraints and work through them using the logic hopefully making some sense of the rules and the policy they implement.

## SELinux MLS Constraints

I think I covered enough background on SELinux to just dive into the specific constraints relevant to the policy we’ve been discussing. I’ve used file read / write in previous examples so we’ll stick to that again in this post. We’ll start with simple read case, then move on to conclude the discussion begun in my last post about constraints on categories for subjects writing data to objects. We may even see the “high-water mark” again 🙂

Let’s begin with the SELinux `mlsconstraint` for read operations on files:

```# the file "read" ops (note the check is dominance of the low level)
mlsconstrain { dir file lnk_file chr_file blk_file sock_file fifo_file } { read getattr execute }
(( l1 dom l2 ) or
(( t1 == mlsfilereadtoclr ) and ( h1 dom l2 )) or
( t1 == mlsfileread ) or
( t2 == mlstrustedobject ));```

In a previous post I went blow by blow through each line of the constraint; we won’t bother with that again. Instead let’s look at the first line, which embodies the simple security property, and frame this in the logic we’ve recently expanded to reason over categories.

When we say that one label dominates another as in the `dom` operator from the constraint, both the sensitivity and the category component of the label must dominate. For the sensitivity we’ve established that the label set is a partial order where the $le_{s}$ operator denotes dominance:

$
Unclassified le_{s} Secret le_{s} Top Secret
$

In my last post we discussed how dominance is proven for a category set. Specifically, the category set A dominates another set of categories B if A is a superset of B. We represent this in the logic with the same operator using a different subscript: $le_{c}$ `c` to denote category instead of `s` for sensitivity.

With this new operator we can formalize the `dom` operator from the `mlsconstrain` but, if you’ve got a good eye for detail you’ve probably noticed that our logic doesn’t have the facilities to deal with SELinux MLS ranges. The MLS parameters from the `mlsconstrain` statements are either the high or low portion of the MLS range. To accommodate this we add subscripts `l` and `h` to the $slev$ and $cat$ operators to access the low and high parts of the range respectively.

With these minor modifications we can represent the full logical expression that must be satisfied in order to conclude a read operation should be granted. Let’s call this implication the `read implication` for reference.

$
(( slev_{l}(O) le_{s} slev_{l}(S) ) land ( cat_{l}(O) le_{c} cat_{l}(S) )) supset read
$

## A Proof using the Read Implication

Any reasoning using this system must be done in the context of a specific example. Let’s construct one for a subject and object then work our way through using the read implication.

Our subjects MLS range will be:

$
Unclassifid - Secret:{ C_{0}, C_{1} }
$

And our objects MLS range will be only a single level as it’s very rare for a passive object on disk to have a high and low MLS component. We can think of this as the low and high MLS levels being the same:

$
Secret:{ C_{1} }
$

I’m not feeling very imaginative today as you can see by my categories simply being $C_{0}$ and $C_{1}$. Regardless we’ve got enough information to do begin a proof with the aims to satisfy the left side of the read implication, which, in plain english, requires that the subjects low MLS label must dominate that of the object.

At this point the logic becomes a mechanical exercise in which we simply show that the `dom` relationship holds … I can see you getting bored but don’t worry, there is a twist but let’s start with our assumptions restated from above in something closer to the form we’d want in a proof:

$
slev_{l}(S) = Unclassified \
slev_{l}(O) = Secret \
cat_{l}(S) = { } \
cat_{l}(O) = { C_{1} }
$

Here’s the twist: we’ve got no way to reasonably conclude $read$ from these assumptions. This is because we would need to derive the left side of the read implication from which we could conclude $read$. We can however see from our given labels that:

$
Secret notle_{s} Unclassified \
slev_{l}(O) notle_{s} slev_{l}(S)
$

and

$
{} notsupset { C_{1}} \
cat_{l}(O) notle_{c} cat_{l}(S)
$

Both the sensitivity and the category of the subject’s low MLS component DO NOT dominate that of the object. So what’s a subject to do?

## Current Level vs Clearance

We need the subjects low MLS level to increase. The mechanics of how this happens and how it’s done in SELinux are out of scope here (that’s how I get out of describing something complicated) but let’s assume there’s a safe way for a subject to do this and that they can’t simply do so at will.

The low MLS level is often referred to as the users “current” level, and the high MLS level as the users “clearance”. The term “high-water mark” comes from the analogy made between the users current level rising as a result of subject actions and the mark made by rising water on a surface.

But I’m digressing. Now that our user can safely increase their current/low MLS level we can get back to our example.

Let’s assume that our user can increase their low/current MLS label to the minimum possible for their access to be granted, assuming of course it will never exceed their clearance. We now use these values as the assumptions in our proof:

$
slev_{l}(S) = Secret \
slev_{l}(O) = Secret \
cat_{l}(S) = { C_{0}, C_{1} } \
cat_{l}(O) = { C_{1}}
$

We then use these to show that the read should be allowed:

$
Secret le_{s} Secret \
slev_{l}(O) le_{s} slev_{l}(S) \
{ C_{0}, C_{1} } supseteq { C_{1} } \
cat_{l}(O) le_{c} cat_{l}(S) \
( slev_{l}(O) le_{s} slev_{l}(S) ) land ( cat_{l}(O) le_{c} cat_{l}(S) ) \
(( slev_{l}(O) le_{s} slev_{l}(S) ) land ( cat_{l}(O) le_{c} cat_{l}(S) )) supset read \
$

This is only the first of several logical constraints from the `mlsconstraint` for file read operations. In a previous post I’ve tried to fit SELinux types and comparisons there of into this logical framework but it’s not a very natural fit. Since this post is already getting close to 1200 words and we still haven’t talked about the constraint on write operations I’ll put off the discussion of types for now.

## File Write MLS Constraint

Now that we’ve examined the read operation, it’s time to take on writing to files. The SELinux constraint for this case is as follows:

```mlsconstrain { file lnk_file fifo_file dir chr_file blk_file sock_file } { write create setattr relabelfrom append unlink link rename mounton }
(( l1 eq l2 ) or
(( t1 == mlsfilewritetoclr ) and ( h1 dom l2 ) and ( l1 domby l2 )) or
(( t2 == mlsfilewriteinrange ) and ( l1 dom l2 ) and ( h1 domby h2 )) or
( t1 == mlsfilewrite ) or
( t2 == mlstrustedobject ));```

Like before we’re going to ignore all except the first part of the constraint: `l1 eq l2`. This is the rule that implements the expected “high-water mark” behavior. The remaining parts of the constraint are relaxations of this rule and should be used with caution.

You’ll notice that the constraint for writing differs from reading. It doesn’t require `l1 dom l2`, instead the current MLS level for the subject must be equal to that of the object: `l1 eq l2`. But instead of introducing new logic to express equality, we’ll use the reflexive property of the $le_{s}$ and $le_{c}$ operators:

$
( slev_{l}(A) le_{s} slev_{l}(B) ) land ( slev_{l}(B) le_{s} slev_{l}(A) ) supset A =_{s} B
$

Notice this deals only with sensitivity so we’d have to show the same for the category component of the MLS label. Accounting for both sensitivity and categorys we would say that if $A = B$ then the reference monitor can conclude $write$ (allow the $write$ operation). We’ll call this implication the write implication for future reference:

$
( ( ( slev_{l}(A) le_{s} slev_{l}(B) ) land ( slev_{l}(B) le_{s} slev_{l}(A) ) ) land ( ( cat_{l}(A) le_{c} cat_{l}(B) ) land ( cat_{l}(B) le_{c} cat_{l}(A) ) ) ) supset write
$

Let’s look at a write request in the context of the MLS ranges for our subject S and object O from the read example above. Again, we’re concerned with the low part of the MLS label so the assumptions are:

$
slev_{l}(S) = Unclassified \
slev_{l}(O) = Secret \
cat_{l}(S) = { C_{0}, C_{1} } \
cat_{l}(O) = { C_{1} }
$

From the read example we know that initially the current MLS level for the subject doesn’t dominate that of the object:

$
slev_{l}(O) notle_{s} slev_{l}(S)
$

so again we’re in the situation where the subect must increase it’s low MLS label to access the object O. Let’s assume this is possible but it’s not enough. We have to account for the category component as well and as we saw above, the category set of the subject and object must be the same. Our subject must then drop $C_{0}$ from its category set as well:

$
slev_{l}(S) = Secret \
cat_{l}(S) = { C_{1} }
$

Now that we’ve increased the low (current) MLS label and dropped the required category from the subject S we can use the logic to prove that our reference monitor will allow the write operation using the write implication from above.

$
Secret le_{s} Secret \
{ C_{1} } supseteq { C_{1} } \
slev_{l}(S) = Secret \
slev_{l}(O) = Secret \
cat_{l}(S) = { C_{1} } \
cat_{l}(O) = { C_{1} } \
slev_{l}(S) leq_{s} slev_{l}(O) \
slev_{l}(O) leq_{s} slev_{l}(S) \
( slev_{l}(S) leq_{s} slev_{l}(O) ) land ( slev_{l}(O) leq_{s} slev_{l}(S) ) \
cat_{l}(S) leq_{c} cat_{l}(O) \
cat_{l}(O) leq_{c} cat_{l}(S) \
( cat_{l}(S) leq_{c} cat_{l}(O) ) land ( cat_{l}(O) leq_{c} cat_{l}(S) ) \
( slev_{l}(S) leq_{s} slev_{l}(O) ) land ( slev_{l}(O) leq_{s} slev_{l}(S) ) land ( cat_{l}(S) leq_{c} cat_{l}(O) ) land ( cat_{l}(O) leq_{c} cat_{l}(S) ) \
( slev_{l}(S) leq_{s} slev_{l}(O) ) land ( slev_{l}(O) leq_{s} slev_{l}(S) ) land ( cat_{l}(S) leq_{c} cat_{l}(O) ) land ( cat_{l}(O) leq_{c} cat_{l}(S) ) supset write \
write
$

An operator to express equality among sensitivities and categories would only serve to shorted the conjunctions in the last two expressions so it may be useful, though only as syntatic sugar.

## The High-Water Mark

Something that we glossed over are the implications of subject actions possibly resulting in a subjects current MLS level increasing. This is a reasonable requirement given that we don’t want to allow a subject to do something like reading a file in one category and then writing data from it to a file with a different category label. We must assume that when the system increases the current MLS level for a subject the operation is safe.

By “safe” we mean there would need to be mechanisms to prevent the user from:

1. setting their low MLS level to anything above their high level
2. that the high level is static and determined by the security administrator
3. that the user cannot decrease their low MLS level without having their environment thoroughly scrubbed to prevent information leakage

These details are only peripheral to the main focus of this post but the details are often what make or break a system. Doing things like sanitizing a users environment to prevent data leakage can be extremely invasive and can disrupt the users work flow significantly.

## Conclusion

There are two things to take away from this post one positive, one negative. First, the logic we’ve been using up to this point is able to represent the reasoning a reference monitor would do when evaluating either a read or a write an access request. This could be used to take a mechanical proof system and use it to implement the core of the SELinux reference monitor. That’s pretty cool.

The negative that’s come to light is that the mechanism by which the “current” MLS label must be managed has a notably temporal aspect. That is to say that when a subject reads or writes data to an object its low MLS label must change to prevent data leakage. All subsequent accesses will then be made in the context of this new label. This is temporal because the sensitivity and categories returned by the functions $slev$ and $cat$ will change based upon the actions of the subject over time.

The management of subject and object labels (basically how they change over time) can, and probably should be a distinct and separate operation from the mechanism making access control decisions. In this way the reference monitor can still us this logic to prove which accesses should be allowed and another separate module using a completely different reasoning engine can be responsible for label management.

## Next Time

This is the last of my posts on MLS in SELinux and representing access control decisions with logic. My next post will be about the work I’ve done implementing the sVirt requirements in a simulated environment to demonstrate the separation properties that can be obtained using MLS categories.

# Understanding Multi-Level Security part #3

There are two parts to a Multi-Level Secuirty (MLS) policy. Now that we’ve covered the sensitivity component it’s time to address the second component which is typically referred to as a category or compartment. Before we get into the rules that govern this policy component however, let’s talk about why we need them.

Sensitivities are a direct reflection of military policy for classifying sensitive data. What’s not typically known is that within each sensitivity level (Unclassified, Secret, Top Secret) there are many sub-compartments for protecting different types of data. An example in the military context would be data about political or diplomatic subjects and data about signal intelligence collection activities.

People working on sensitive data in these different areas of interest likely don’t have any need to access the other category even though both sets of data are at the same sensitivity level. The need for this additional level of separation resulted in the creation of this second component of the policy model.

## Categories / Compartments

The oldest publication that I could find describing a reference monitor expected to enforce MLS categories is from a document describing the ADEPT-50 time sharing system [1]. This system was implemented as one of the first remotely accessible (terminal) main frames enforcing a military security policy. Unfortunately the ACM wants \$15 for the paper which is pretty steep for the casual reader. If your company / university / local library has an ACM subscription then you should definitely pull down this article. It’s a great read.

If you can’t get the article, the relevant parts have been covered in other places. I’d specifically recommend the SELinux Wiki [2] which is accurate and relevant if a bit sparse. Further, Chin and Older [3] also give the topic consideration and apply the similar rules.

From these sources we know that a subject must poses a superset of the categories assigned to the object it wishes to access for a read operation to be granted. We represent the subset relationship where set A is a subset of set B like so:

$
A subseteq B
$

Similarly we can flip the relationship around and say that B is a superset of A like so:

$
B supseteq A
$

Using the language developed previously, we would say that B dominates A if $B supseteq A$.
Note: these are not proper subsets, the relationship holds if A and B are the same set.

## The Logic

Now that we’ve established the rules governing the category component of an MLS policy let’s incorporate these rules into the logic. I’ve been using operators and logic from [3] specifically $slev ()$, short for “security level”. In previous examples this operator has returned the sensitivity level of the parameter (a principle). Now that we’re considering categories we’ll need an operator to return the category component of the principles security label.

For this we define the function $cat ()$ which, much like $slev ()$ takes a principle as its only parameter. What’s unique though is that it returns a set containing the categories assigned to the principle. For a principle with categories ${ C_{0}, C_{1}}$ it would return:

$
cat (P) = { C_{0}, C_{1} }
$

Note: the categories assigned to P may be the empty set: ${}$ if the principle has been assigned no categories.

Now that we have a way to obtain the necessary information about principles we need an operator to compare them. Again we use the operator $le_{s}$ to compare sensitivity labels but we now add a similar operator $le_{c}$ to allow us to perform a comparison of category sets. To do this let’s define two principles $P_{1}$ and $P_{2}$. We define their security labels like so:

$
cat (P_{0}) = { C_{0}, C_{1} } \
cat (P_{1}) = { C_{0} }
$

With this information we can conclude that the security label for $P_{0}$ dominates that of $P_{1}$ from our discussion of sets above. Specifically

$
({C_{0}, C_{1}} supseteq {C_{0}}) supset ({C_{0}, C_{1}} le_{c} {C_{0}})
$

We generalize this implication as:

$
(cat(P_{0}) supseteq cat(P_{1})) supset (cat(P_{0}) le_{c} cat (P_{1}))
$

Recalling our discussion of sensitivity levels and the slev operator we would say that principle $P_{0}$ dominates principle $P_{1}$ if:

$
slev (P_{1}) le_{s} slev (P_{2}) land cat (P_{1}) le_{c} cat (P_{2})
$

For now we’ll assume the sensitivity levels of our principles are all the same so we can focus on the categories alone.

Now that we have all the necessary logic we can construct a more complete example. Considering a read operation, our goal is (as the reference monitor) to prove that the security label of our subject S dominates that of our object O so that we can conclude access should be granted. We represent this as the logical expression:

$
(cat (S) le_{c} cat (O)) supset read
$

Now we need to make some assumptions about the system, specifically defining the categories assigned to the principles S and O (subject and object respectively):

$
cat (S) = {C_{0}, C_{1}}) \
cat (O) = {C_{0}} \
$

From these assumptions we can show that, using the above logic, the categories assigned to S dominate those of O:

$
{ C_{0}, C_{1} } supseteq { C_{0} } \
(cat(S) supseteq cat(O)) \
(cat(S) supseteq cat(O)) supset (cat(S) le_{c} cat (O) \
cat (S) le_{s} cat (O)
$

With this we can conclude (by modus ponens) that this read operation will be allowed

$
cat (S) le_{s} cat (O) \
(cat (S) le_{c} cat (O)) supset read \
$

## Writing to a file

The rules governing file reads is straight forward: if a subject with the two categories $C_{0}$ and $C_{1}$ it should be allowed to read from an object with either category. But, as always, writing to an object is a bit different. This same subject attempting a write to the an object O with category $C_{0}$ raises a very interesting question: the subject may contain data from either category 0 or 1 through a previous read operation, but which category will the data it writes to our object belong to? As stated, our object O has category $C_{0}$ and we don’t have any assurance that the subject won’t write data from category $C_{1}$.

In [1] the authors address this concern for file creation using a “high-water mark” created from a history of all files accessed by a job / process. They do not however differentiate between read and write access on a file. Presumably the high-water mark method would also work for file writes but it would require the reference monitor to track the data categories accessed by a process and ensure that all files written be labeled with the full set of categories previously read by the process.

This loosely stated rule would be quite complex when implemented. It would require a full history of the categories accessed by a process be maintained in the kernel and that this be done for each process. For some processes this list would get quite long and take up significant storage.

An alternative approach is simply to trust the subject to write only data of the appropriate category to the file. This isn’t very viable as it has a significant short fall discussed above. Generally any approach that is described with the phrase “trust the subject” will be rejected by security auditors.

Finally the most restrictive approach would be to require a subject writing to an object to have the same set of categories as the object. This seems like the best approach as far as security is concerned but it is very restrictive and could cause over classification of newly created files if the user is not diligent.

## Conclusion

The concept of categories / compartments from MLS policy fits naturally into the logic we’ve been using to reason over sensitivities in my previous posts. It’s a simple concept based on set comparisons that a reference monitor can do quickly and efficiently. This policy’s application to reading files is simple but the best way to handle writes comes with significant complications which we discussed here briefly.

Instead of trying to solve this last complication in this post I’m punting. In my next post however we’ll get into the SELinux policy. We’ll take a look at how they deal with categories in the reference policy focusing on the read operation to make concrete the logic from this post. Then we’ll move on to how refpol deals with categories in file write operations hopefully shedding some light on how a real system manages the complexity discussed in the last section of this post.

## References

1 C. Weissman. 1969. Security controls in the ADEPT-50 time-sharing system. In Proceedings of the November 18-20, 1969, fall joint computer conference (AFIPS ’69 (Fall)). ACM, New York, NY, USA, 119-133. DOI=10.1145/1478559.1478574 http://doi.acm.org/10.1145/1478559.1478574
2 SELinux Wiki. Managing Security Levels via Dominance Rules. http://selinuxproject.org/page/NB_MLS#Managing_Security_Levels_via_Dominance_Rules
3Chin, Shiu-Kai and Older, Susan. Access Control, Security and Trust CRC Press, 2011. http://www.amazon.com/Access-Control-Security-Trust-Cryptography/dp/1584888628

# Understanding Multi-Level Security part 2

In my last post I gave a brief description of the two governing principles of Multi-Level security. I then used a short-hand version of a logical framework from Chin and Older to explain how a reference monitor may use such a logic to make access control decisions.

In this post I go one step further to look at the MLS component of SELinux policy. I’ll relate the MLS constraints from SELinux to the logical framework from Chin and Older and show how I’ve used their syntax to help debug SELinux denials (AVC denied) in my own policy writing.

## SELinux

The type enforcement (TE) model implemented in SELinux is very different from that of the MLS model. TE is a completely different approach to confining subjects and objects. Interesting enough though, to remain relevant in the government space (i.e. to get Common Critieria certified at the EAL4 level against the LSPP), SELinux was extended with the mechanisms necessary to implement an MLS policy. A significant portion of this work was done by TCS and is documented in a brief white-paper [1].

This paper describes the changes to the SELinux implementation in the kernel that were necessary, but assumes the reader is intimately familiar with government security policy. To fill that gap I’ve taken the logical rules from Chin and Older described previously and used them to explain the constraints that implement the MLS policy in SELinux.

## SELinux Constraints

WE’ll start with a brief description of the SELinux constraint syntax. The interested reader I’ll refer the to the base constraint file in the SELinux source tree: \$SELINUX_SRC/policy/constraints. In short, constraints allow for the policy author to define additional rules that must be satisfied beyond the SELinux type enforcement policy.

```constrain dir_file_class_set { create relabelto relabelfrom }
(
u1 == u2
or t1 == can_change_object_identity
);```

The first keyword is constrain which simply identifies the constraint. Next comes a set of object types, followed by a set of permissions. When making an access control decision the security server will look at the requested permission and the object type and if they both are members of these sets the constraint will be evaluated. The constraint itself is a simple set of logical expressions. The parameters available for use in these expressions are the user, role and type component of the labels of both subject and object in the request. Components of the subject label are suffixed with 1, objects with 2.

In the expression above we see that the constraint requires that for a file or directory to be created or relabeled the user component of the subject and object labels cannot change (thus they must be equal or ==).
ASIDE: This constraint may seem like a mundane detail but it’s actually quite significant. Historically UNIX operating systems allowed users with sufficient privileges to change their identity to that of another user to perform administrative tasks. This constraint says that SELinux will never allow the user to change their identity (in the SELinux context) making attribution possible.

Examining the constraint further we see there is a loop hole that allows a subject with a specific type attribute (can_change_object_identity in this case) to change the user identity on a file or directory. Loop holes like this are unfortunate but are required for real systems to function.

## MLS Constraints

The rules for MLS Constraints are the same as standard constraints. The only differences are:

• subject and object labels available for use have the additional MLS label components
• new operators which allow for the comparison of sensitivity labels

The MLS components of the SELinux labels come in two parts: the low clearance and the high clearance. These are pretty self explanatory as they represent the lowest possible clearance for a principle and the highest clearance the principle may access.
The new operators are:

1. eq: two MLS label components are equal
2. dom: one MLS label component dominates another
3. domby; same as dom but reversed
4. incomp: two MLS label components are incomparable

I’ll explain the important operators as well as an essential MLS property with an example:

```mlsconstrain { dir file lnk_file chr_file blk_file sock_file fifo_file } { read getattr execute }
(( l1 dom l2 ) or
(( t1 == mlsfilereadtoclr ) and ( h1 dom l2 )) or
( t1 == mlsfileread ) or
( t2 == mlstrustedobject ));
```

This rule constraints read operations on a rather large set of object types. The first line of the statement is the expected simple security property: the low clearance of the subject must dominate the low clearance of the object. Simply put this allows subjects to read at or below their lowest clearance.

The second line allows a subject to read an object if it’s highest clearance dominates that of the object AND it’s type is that of ‘mlsfilereadtoclr’ which is short hand for ‘mls file read to clearance’. This is a special set of subjects that are allowed to read to their highest clearance if they’re operating with a ranged label. Most processes will run with only one MLS label so this constraint will rarely come into play except for processes like init.

The remainig two lines are again, types that allow for loop holes. A process with this type would be considered a “trusted process” in the literature which basically means it’s a process allowed to violate the security model. We’d prefer to have none of these if possible but they exist for a number of reasons. These could be poorly coded software that can’t be fixed by those who care about these things because they’re proprietary or they’re simply processes that must violate MLS to perform their task.

## Evaluating MLS Constraints in the logic

Using the logic presented in my previous post to reason through an access control decision based on the above SELinux constraints requires we have a way to evaluate expressions using the operators from the constraint language. I’ll address them in the order they’re presented above:

1. eq: equality
2. Testing the equality of two sensitivity levels is straightforward. Using the $slev$ operator we get the sensitivity of a principle and then compare the two using a simple comparison of the strings from our partial ordering. If our reference monitor is optimized it may map the strings to integers (as is the case with SELinux) for a faster comparison but the principle holds.

3. dom: dominance
4. This operator is well developed by Chin and Older in their $le_{s}$ operator and was the subject of my previous post.

5. domby: variation on dominance
6. This is simply the reverse of the dominance rule and I think it’s included as syntactic sugar.

7. incomp: incomparable
8. This operator is only valid when considering the category component of an MLS label and is deferred for future discussion of the Multi-Category Security (MCS) policy.

Now that the operators are sorted out we can use the constraint in an access control decision. I’m using short hand just to keep this post from turning into a book so the logicians among your may groan a bit …
First we must state the partial ordering for secrecy lables in our system. These should be familiar by now:

$
Unclassified le_{s} Secret le_{s} Top Secret
$

Next we define a subject and an object in our system and assign them both secrecy levels:

$
slev (O) == Unclassified \
slev (S) == Secret
$

Now that we’ve defined an Unclassified object and a Secret subject we can revisit the simple security property and the additional rules from the SELinux MLS constraint. I’m going to make up some functions and operators out of thin air so brace yourselves:

$
(slev_{l}(S) le_{s} slev _{l}(O_{l})) cup \
(type (S) == mlsfilereadtoclr) cap \
(slev_{h}(S) le_{s} slev_{l}(O)) cup \
(type (S) == mlsreadfile) cup \
(type (O) == mlstrustedobject) supset read
$

First I’ve augmented the $slev$ function such that it can deal with principles having a high and low sensitivity level. I’ve done so by adding a subscript of ‘l’ for the low sensitivity and an ‘h’ for the high sensitivity.

Second I’ve added another function similar to $slev$ called $type$. Like $slev$ it allows us to access a component of the security label for a principle. Instead of the classification however it returns the type component of the SELinux label. For the sake of completeness lets give our subject and object types:

$
type (S) == type_a \
type (O) == type_b
$

Now to it’s just a matter of putting it all together. First from the type and sensitivity assignments for our subject ‘S’ and object ‘O’ we can replace instances of $slev$ and $type$. I’ve only defined one sensitivity label for each object so both low and high labels are the same.

$
((Secret le_{s} Unclassified) cup \
(Secret le_{s} Unclassified)) cup \
$

Since neither neither subject or object we’re considering have the types listed in this constraint we can quickly mark those comparisons as failing. Further we can simplify terms that have become ‘false’ and we get:

$
$

In this simple case we’ve reduced the constraint to the simple security property. Further using the partial ordering we’ve defined we can see that Secret dominates Unclassified and our read request should be allowed by the reference monitor.

## Conclusion

This very brief exercise shows the application of a relatively new logical framework to the analysis of a specific SELinux MLS constraint. I’ve found this type of reasoning extremely useful in debugging MLS policy as the available tools (audittoallow) provide no useful information about denials caused by constraints. The exercise performed in this post is equally applicable to any other MLS constraint from the reference policy so if you’re interested give it a try with the constraint for common write operations on files and prove the ★-property.

Up next is the MCS policy …

## References

1 Chad Hanson, SELinux and MLS: Putting the Pieces Together, TCS

# Understanding Multi-Level Security Part 1

In my last post I introduced a topic I’ve been working on over the past year. That post provided a description of the problem and it’s importance, but didn’t get into the details behind the technologies used in the solution I’ve worked up. This post is the first of several to come that will fill in these details.

## Purpose

I’ve been having a hard time sorting out the best way to present the relatively complex topic of computer security policy and it’s application to my specific project. My first few attempts to write this up resulted in some very convoluted ramblings.

It seemed like I was starting in the middle and missing all the background. Hoping to fix this I’m starting from the beginning with a quick explanation of the security policy that’s at the core of my project: military security policies and their implementation in SELinux.

To explain this I’m using a logical framework put forward by Chin and Older in their recent book [1]. I took a class with Dr. Chin this past fall and was particularly impressed with the contents of the book and the clear, concise syntax they developed.

## Classifications

Everyone who’s seen a good spy movie has heard references to military security policy. I think it’s safe to day that every James Bond or Jason Bourne movie made a few references to Unclassified, Secret or Top Secret data. These are typically referred to as classifications or sensitivity levels in the literature. Everyone who’s seen these movies understands what the policy is, but few people have a solid grasp on the mathematical rules that govern access control systems implementing this policy.

It really is as simple as it seams though and easy to explain in the context of subjects (generally people or computer processes acting on their behalf) and objects (either physical or electronic). There are two governing principles [2]:

• the simple security property: a subject operating at a given sensitivity level can read objects at the same level or below
• the ★-property: a subject operating at a given sensitivity level can write to objects at the same level or higher

A computer system that implements these two rules is said to implement a multi-level security (MLS) policy.

Inherent in these two rules is a partial ordering among classification labels. Chin and Older describe this ordering by first defining a function that produces the sensitivity of a principle or object: $slev (X)$. For a given object O that is classified Secret we would say:

$
slev (O) = Secret
$

They also define an operator: $le_{s}$ that describes the relationship between two sensitivity levels. Going back to the sensitivity levels we’re so familiar with we would say:

$
Unclass le_{s} Secret le_{s} TopSecret
$

Simply put a TopSecret principle is more sensitive than (or dominates) a Secret principle, which dominates an $Unclass$ principle. Combining these two operators we can now describe these relationships in an access control scenario.

### The Simple Security Property

Getting back to the security properties for military data we can use the above syntax to describe the system. The simple security property requires that a subject operating at a given security level can read objects at the same level or below. We’d such a situation as:

$
(slev (O) le_{s} slev (S)) supset read
$

Above we assume the ‘O’ and ‘S’ are an object and a subject in our system respectively. By the definition of $slev ()$ we know it returns the sensitivity level of its parameter so let’s say that object O is classified Secret and subject S is classified Top Secret. We could then say:

$
$

Knowing that $S le_{s} TS$ is true by the ordering we’ve defined and our knowledge of logical implication (modus ponens WHAT!) we can deduce $read$ from this for future use in our logical framework and we take this to mean a read operation would be granted. Chin and Older provide a more rigorous approach but for our purposes the short hand I’m using is sufficient (because I say so).

### The ★-Property

The ★-property requires that the sensitivity level of a subject is lower than or equal to that of the object for a write access to be permitted. We’d represent this as:

$
(slev (S) le_{s} slev (O)) supset write
$

If ‘S’ and ‘O’ are the same subject and object as in the previous example the values of $slev (S)$ and $slev (O)$ will again be top secret and secret respectively. Thus:

$
TS le_{s} S supset write
$

Using the same partial ordering defined previously we can see that the first part of the implication here is false. This time from our knowledge of logical implication we can’t say anything about the term on the right (the $write$ statement), so we cannot say a write would be granted. It would then be denied by any sane access control implementation. Again I’m using short hand.

## Conclusion

This brief introduction was an attempt to frame basic MLS constructs in a simple logic developed by Chin and Older. Alone this post won’t mean much to most people but it’s intended as a foundation. In my next post I’ll introduce SELinux constraints, MLS constraints and use the logic from this post to illustrate how a reference monitor would evaluate constraints in an access control decision.

## References

1 Chin, Shiu-Kai and Older, Susan. Access Control, Security and Trust CRC Press, 2011.
2 Bell, David Elliott and LaPadula, Leonard J. Secure Computer Systems: A Mathematical Model, MITRE Corporation, 1973.
3 Chad Hanson, SELinux and MLS: Putting the Pieces Together, TCS

# Another Go at My Masters Project

About a year ago I started working on my masters project. The topic turned out to be … not interesting enough to hold my attention I guess. This ended up being for the best since shortly after losing interest in my first project I got picked up on a project at work that I managed to dual-purpose for both work and school. I’m at a point where all I need to do to graduate is write this work up.

This first post (with a new tag) is a brief introduction of what this work is and why you should care. Future posts on this topic will cover the technical background and an implementation, both a simulation and code targeted at a specific Xen-based platform.

## SELinux and Virtualization

I’ve been playing with SELinux for a while now and more recently I’ve had the need to work with the Xen hypervisor. Specifically I’ve been tasked with getting an SELinux policy running in the management VM (a.k.a. dom0). Most people savvy on SELinux would simply use a Linux distro that supports SELinux out of the box as their dom0 like Debian, Fedora, CentOS / RedHat. This is a valid argument but a minimal Debian install has a root file system that’s about about a gigabyte in size and this was too big for our purposes.

The project was actually in full swing when I was tasked with SELinux integration and the team had already rolled their own dom0 using openembedded [1]. Getting SELinux up and running on an OE system was a pain but it’s done and that work isn’t all that interesting. What was interesting was implementing some policy and machinery to separate the instances of QEMU [2] that provide emulation for the HVM Windows guests [3].

SELinux does a great job of formalizing the interactions between processes. Any interactions that aren’t specified are prevented. I won’t go into a detailed explanation of the SELinux policy language [4] or Domain and Type Enforcement [5]. These are well documented elsewhere [6].

## The Problem

What’s interesting is that on an SELinux system each process runs with a label defined by the label of it’s parent (the process that caused the execution) and the label on the programs binary file on disk. This means that a process performing multiple fork/exec calls on the same binary will produce child processes all with the same SELinux label. Under nearly all circumstances this is the desired effect. On our specific platform there is a case where this isn’t what we want.

As our platform is Xen, xend is the daemon responsible for managing VMs. When xend starts an HVM client it executes QEMU. This causes all QEMU instances running to have the same SELinux label. In SELinux speak we would say these QEMU instances are all running “in the same domain” which is equivalent to them all having the same permissions, or more precisely, the same access to system resources.

At this point understanding the separation goals of Xen is important. Where an OS kernel aims to keep running processes separate, Xen aims to keep running VMs separate. Having instances of QEMU operating on behalf of what are effectively untrusted client VMs all with the same SELinux label undermines this goal of separation.

## Separating QEMU Instances

The sVirt [7] project specifically addressed this problem with a prototype implementation some time back. Eventually this was integrated with libvirt so if you’re running libvirt[8] with the SELinux security driver[9] loaded then presumably you’ve got these protections in place. But again due to the embedded nature of our dom0 libvirt was way more software than we needed.

It became necessary to implement the basic sVirt design but integrated directly into our management stack. This isn’t a line-for-line re-implementation of the libvirt SELinux code though. I’ve made a number of changes that I will discuss in detail in the following posts though the design goals remain the same: keep instances of QEMU separate. The metric we’re using to judge the usefulness of this work is our answer to the question “what could a compromised QEMU instance access?”. If the answer to this is “all of the virtual hard disks on the system” then obviously we’ve failed. Instead we’re aiming to confine a QEMU instance to the resources it needs to function, like the virtual disks belonging to the VM it is providing services to.

## Next Time

Now that the basic problem is laid out the next step is to cover some background.
In my next post I’ll cover the background necessary to understand the specific pieces of the SELinux policy that this work uses to achieve these goals: the MCS policy.

## References

1 the Open Embedded project: http://www.openembedded.org/index.php/Main_Page
2 the QEMU project: http://wiki.qemu.org/Main_Page
3 Xen-HVM: http://en.wikipedia.org/wiki/QEMU#Xen-HVM
4 SELinux policy language: http://selinuxproject.org/page/PolicyLanguage
5 Practical Domain and Type Enforcement for UNIX: http://portal.acm.org/citation.cfm?id=882491.884237
6 SELinux By Example: http://www.informit.com/store/product.aspx?isbn=0131963694
7 sVirt Requirements Document: http://selinuxproject.org/page/Svirt_requirements_v1.0
8 The libvirt Project: http://libvirt.org/
9 libvirt SELinux Driver: http://libvirt.org/drvqemu.html#securitysvirt

# Laptop Docking Script and SELinux

I’ve continued to make progress in my efforts to get SELinux working on my laptop. My day job has been pretty demanding these days but I ran into a few interesting AVCs and had to come up with a fix that is worth mentioning. Let’s start with the AVCs:

```avc:  denied  { read } for  pid=530 comm="dock.sh" name=".X11-unix" dev=dm-1 ino=1179649 scontext=system_u:system_r:udev_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xdm_tmp_t:s0 tclass=dir
avc:  denied  { open } for  pid=530 comm="dock.sh" name=".X11-unix" dev=dm-1 ino=1179649 scontext=system_u:system_r:udev_t:s0-s0:c0.c1023 tcontext=system_u:object_r:xdm_tmp_t:s0 tclass=dir
```

There’s only two AVCs so it’s a pretty easy fix right? Just allow udev access to xdm’s tmp files … yeah I’m not convinced. What the heck is udev doing reading tmp files belonging to X? Yeah and what’s the command “dock.sh” all about (from the comm section in the AVC log).

A little digging in the udev rules files shows that this script is run when my laptop gets docked and undocked. So it makes sense that this script would manipulate my X configuration but it shouldn’t be digging around in private files belonging to X. Whoever wrote this script messed up so just granting udev additional permissions isn’t the right approach.

This time we have to get in touch with the author and fix the script. Interestingly enough this is one of my scripts! I wrote about this script some time ago and it looks like it’s time to update it.

### Rewriting dock.sh with SELinux in mind

This is one of those situations where questioning these accesses is the right thing to do. But it’s not that the script shouldn’t be reconfiguring X, it’s that it shouldn’t be messing with X’s private tmp files. Really this has less to do with security than it does with good program design. These are temporary files and X may change their location and contents at any time. No other program should depend on them.

There’s a surprisingly easy fix to this problem and it’s a pretty cool one. ConsoleKit is a daemon that keeps track of desktop “sessions” which are effectively X logins. This is a gross over simplification and the interested reader should dig into the ConsoleKit docs for a more accurate description. What’s important is that the information that my dock.sh script needs is exposed by ConsoleKit through a few DBus calls.

All of the direct access to tmp in this script can be replaced by 3 calls to ConsoleKit through DBus. I’ve updated the script in my previous post where you can check it out if you’re interested.

### Conclusion

This post ended up having less to do with SELinux and more to do with good programing practices. Write your programs / scripts to use the published interfaces available to accomplish their goals. Mucking around in another programs temporary files is a bad idea (shame on me!). Where SELinux comes into play is that it formalizes these accesses in policy forcing some bad programming practices out into the open.

If you’re following along at home you’ll notice that this script went from causing 2 AVCs to causing about 10 🙂 I’m OK with that because the script is that much better than it was. Looks like this script needs to have it’s own policy module. I’ll save that for later this week. Stay tuned.