jail, section 6.

6. Implementation jail in the FreeBSD kernel.

6.1. The jail(2) system call, allocation, refcounting and deallocation of `struct prison`.

The jail(2) system call is implemented as a non-optional system call in FreeBSD. Other system calls are controlled by compile time options in the kernel configuration file, but due to the minute footprint of the jail implementation, it was decided to make it a standard facility in FreeBSD.

The implementation of the system call is straightforward: a data structure is allocated and populated with the arguments provided. The data structure is attached to the current process' struct proc, its reference count set to one and a call to the chroot(2) syscall implementation completes the task.

Hooks in the code implementing process creation and destruction maintains the reference count on the data structure and free it when the last reference is lost. Any new process created by a process in a jail will inherit a reference to the jail, which effectively puts the new process in the same jail.

There is no way to modify the contents of the data structure describing the jail after its creation, and no way to attach a process to an existing jail if it was not created from the inside that jail.

6.2. Fortification of the chroot(2) facility for filesystem name scoping.

A number of ways to escape the confines of a chroot(2)-created subscope of the filesystem view have been identified over the years. chroot(2) was never intended to be security mechanism as such, but even then the ftp daemon largely depended on the security provided by chroot(2) to provide the ``anonymous ftp'' access method.

Three classes of escape routes existed: recursive chroot(2) escapes, ``..'' based escapes and fchdir(2) based escapes. All of these exploited the fact that chroot(2) didn't try sufficiently hard to enforce the new root directory.

New code were added to detect and thwart these escapes, amongst other things by tracking the directory of the first level of chroot(2) experienced by a process and refusing backwards traversal across this directory, as well as additional code to refuse chroot(2) if file-descriptors were open referencing directories.

6.3. Restriction of process visibility and interaction.

A macro was already in available in the kernel to determine if one process could affect another process. This macro did the rather complex checking of uid and gid values. It was felt that the complexity of the macro were approaching the lower edge of IOCCC entrance criteria, and it was therefore converted to a proper function named p_trespass(p1, p2) which does all the previous checks and additionally checks the jail aspect of the access. The check is implemented such that access fails if the origin process is jailed but the target process is not in the same jail.

Process visibility is provided through two mechanisms in FreeBSD, the procfs file system and a sub-tree of the sysctl tree. Both of these were modified to report only the processes in the same jail to a jailed process.

6.4. Restriction to one IP number.

Restricting TCP and UDP access to just one IP number was done almost entirely in the code which manages ``protocol control blocks''. When a jailed process binds to a socket, the IP number provided by the process will not be used, instead the pre-configured IP number of the jail is used.

BSD based TCP/IP network stacks sport a special interface, the loop-back interface, which has the ``magic'' IP number 127.0.0.1. This is often used by processes to contact servers on the local machine, and consequently special handling for jails were needed. To handle this case it was necessary to also intercept and modify the behaviour of connection establishment, and when the 127.0.0.1 address were seen from a jailed process, substitute the jails configured IP number.

Finally the APIs through which the network configuration and connection state may be queried were modified to report only information relevant to the configured IP number of a jailed process.

6.5. Adding jail awareness to selected device drivers.

A couple of device drivers needed to be taught about jails, the ``pty'' driver is one of them. The pty driver provides ``virtual terminals'' to services like telnet, ssh, rlogin and X11 terminal window programs. Therefore jails need access to the pty driver, and code had to be added to enforce that a particular virtual terminal were not accessed from more than one jail at the same time.

6.6. General restriction of super-users powers for jailed super-users.

This item proved to be the simplest but most tedious to implement. Tedious because a manual review of all places where the kernel allowed the super user special powers were called for, simple because very few places were required to let a jailed root through. Of the approximately 260 checks in the FreeBSD 4.0 kernel, only about 35 will let a jailed root through.

Since the default is for jailed roots to not receive privilege, new code or drivers in the FreeBSD kernel are automatically jail-aware: they will refuse jailed roots privilege. The other part of this protection comes from the fact that a jailed root cannot create new device nodes with the mknod(2) systemcall, so unless the machine administrator creates device nodes for a particular device inside the jails filesystem tree, the driver in effect does not exist in the jail.

As a side-effect of this work the suser(9) API were cleaned up and extended to cater for not only the jail facility, but also to make room for future partitioning facilities.

6.7. Implementation statistics

The change of the suser(9) API modified approx 350 source lines distributed over approx. 100 source files. The vast majority of these changes were generated automatically with a script.

The implementation of the jail facility added approx 200 lines of code in total, distributed over approx. 50 files. and about 200 lines in two new kernel files.