sVirt-like prototype

We’re getting close to the end of my on-going series exploring the SELinux mlsconstrain. Now that we’ve gone though and used some simple logic to reason through access control decisions it’s time for a simple and practical application.


In my first post under my MastersProject tag I laid out some of the justification for why separating QEMU instances with SELinux policy in a Xen dom0 is important. To reiterate: my goal is to ensure that no qemu instance can read or write to the virtual disk belonging to another qemu instance. These are the same goals as the sVirt [1]
project which is responsible for the SELinux driver in libVirt [2]. We’ll use the same MLS categories we’ve been discussing to achieve the desired separation and we’ll validate our design with the logic previously discussed.


A simulation like this requires a system with a functional SELinux policy (running in enforcing mode). We assume that the user is running in an unconfined domain as this is how my development system is configured.

Simulation Design

As always implementing a prototype/simulation first before starting in on the actual implementation is always a good idea. For our purposes this simulation will serve to familiarize us with the necessary bits of the SELinux API required to achieve our goals (will ease debugging later) and to prove to us through testing that our prototype can in fact fulfill the requirements. The simulation has three main parts:

SELinux-specific code

The central piece of this work is the code specific to managing SELinux labels on virtual disks (vhd files) and on running QEMU processes. We select a unique category for each running VM and the properties of these categories provide us with the separation we desire. There’s no primitive to perform this task explicitly so we generate unique category numbers using a naive algorithm.

Each time a QEMU instance is started a category number is generated at random and a list of allocated categories is searched to prevent duplicates. If the category is already in use, another number will be generated and the list searched again. This continues until an unused category is found. NOTE: This is the least efficient algorithm possible but for a simulation we only need a functional algorithm, not a perfect one. This would be a prime candidate for improvement in a real implementation.

The algorithm for generating unique categories I’ve described above relies on a test for equality. Remember our previous discussion of the mlsconstrain on read and write operations. Write permissions required that the subject and object labels be equal. Read permissions required that the subject label dominate the object label.

With this in mind it may seem that testing for equality alone wouldn’t be sufficient. But since an equality test is more strict than a test for dominance it is sufficient to test for equality only.

Finally the SELinux specific code must ensure that the unique category we’ve just generated is applied to the necessary subjects and objects. Firstly the VHD files belonging to the QEMU instance must be labeled with this category. Selecting the proper label, adding the category and changing the label of the VHD file requires the use of libselinux and the code can be found in the prototype sources at src/sec-selinux.c. We reserve the first category (c0) for VHD files that are currently not in use and ensure that no running QEMU processes ever get this category.

The QEMU process itself must also be labeled with the unique category when it is exec’d. Determining the label for the process is much the same as for the VHD file, but we must call the setexeccon function before forking the QEMU instance instead of directly manipulating the label on the running process.

While the man pages for these API calls are a great documentation source I’ve found that currently on Debian they are incomplete. Specifically functions like selinux_virtual_domain_context_path are undocumented in the man pages. A better source as of now can be found at the SELinux Wiki API page [3].

Mock QEMU Binary

Our prototype needs a binary to start up and test the limits of the confinement we’ve set up. We don’t intend to test the QEMU reference policy, only to test the limits of the confinement properties of the categories we’re assigning to each QEMU instance. As such we don’t want to use QEMU itself. Instead we want a simple binary that attempts to exceed its intended behavior.

As we’re aiming to limit access to the VHD files assigned to individual QEMU instances we want our mock QEMU to attempt to access the VHDs that don’t belong to it. We define two modes of operation for this program. First is what we term “good behavior” in which the process will only read and write to a file we specify on it’s command line. The second mode of operation we term “bad behavior” in which we specify a file on it’s command line again, but this time it will attempt to read and write to all files in that same directory.

To allow us to observe these fake QEMU instances at run-time I’ve added a number of syslog calls. These document the success or failure of attempted actions. Using this output we can observe the effectiveness of the protections offered by this system. This code is in the file: src/not-qemu.c

VM Driver

The final component is a text driven interface for manipulating the simulation. This component takes the place of the standard virtualization tool stack which creates, starts, stops and destroys VMs. We expose four primary actions the user may choose:

  1. Create VM: create internal representation of a VM including SELinux label and mock VHD file.
  2. Start VM: launch the mock QEMU process for a VM in the assigned SELinux domian.
  3. Stop VM: stop the mock QEMU process.
  4. Delete VM: free up the assigned SELinux category and mock VHD file for the specified VM.

With an interface like this the user can control the simulation explicitly. Tailing the output from syslog will allow the user to observe the behavior of the simulation in real time where they can judge for themselves whether or not the confinement is effective.


The policy for this simulation contains all of the necessary permissions for each component to function. There are also a number of permissions that are specific to the simulation. Any implementation directly in a tool stack won’t require any special permissions for SELinux users or groups as the stack will run as the system user in system role which is also the case for QEMU processes.

Our simulation is meant to be interactive and we assume that the user is unconfined. The user and role for the simulated qemu instances will be system_u and system_r respectively so some additional permissions are required by the simulation driver to allow changing the user and role label components when the qemu simulation is started and when labeling VHD files. This is simply a side effect of making the simulation interactive for the unconfined user.

The interesting parts are as follows:

allow mcs_server_t self:process { setexec };
allow mcs_server_t mcs_server_image_type:file { relabelfrom relabelto };

        range_transition mcs_server_t mcs_server_image_t:file s0:c0;


You can see that we’ve defined two types, one for the driver (mcs_server_t) and one for the VHD files used by the simulated qemu instances (mcs_server_image_t). The server can manage the VHD files, it can read default contexts (required to use the libselinux API calls we need), and it can read from /dev/urandom which our category selection algorithm requires. The only nuance here is the range_transition which ensures that when the mcs_server_t type creates a new VHD file it is created with the proper context.

The permissions required for interaction between the mcs-server and the not-qemu instances is implemented as an interface in the mcs-server module (interface):

                role system_r, unconfined_r;
                type mcs_server_t;

	type $1_t;
	role system_r types $1_t;
	allow unconfined_r system_r;

	type $1_exec_t;

        allow mcs_server_t $1_t:process { getattr getsched setsched signal signull sigkill };
	domtrans_pattern(mcs_server_t, $1_exec_t, $1_t)

	# Only required in example where user directly invokes mcs-server.
        #   In actual implementation user will be system_u for source and
        #   destination domains so no exemption will be required.
	role_transition unconfined_r $1_exec_t system_r;

This is a slightly strange pattern as the not-qemu module pretty much calls this one interface and that’s it. This allows the mcs-server to transition to the calling domain through an executable. It also allows the calling domain to read and write to read and write VHD files. There are two permissions at the end of the interface that are only required for our interactive simulation and they’ll be removed in a real implementation.


This simulation and the accompanying SELinux policy can be obtained from a public git treen on this server: git:// Just download it, build it and give it a try. Be sure to read the requirements above and be sure you’ve set all that up because there’s a lot of moving parts behind the scenes with the build system and stuff.


So that’s a brief description of a simulation implementing the sVirt design in an interactive prototype. It’s still very rough but has been extremely useful as a learning tool for myself. Next time I’ll do a walk through of the simulation and discuss some techniques we can use to be sure it’s functioning as expected.


1 sVirt Requirements Document:
2 SELinux sVirt confinement in libVirt:
3 API Summary for libselinux:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s