Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jul 2003 13:32:59 -0400 (EDT)
From:      Robert Watson <rwatson@freebsd.org>
To:        Pawel Jakub Dawidek <nick@garage.freebsd.pl>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Communications kernel -> userland
Message-ID:  <Pine.NEB.3.96L.1030720125403.91635A-100000@fledge.watson.org>
In-Reply-To: <20030719074707.GB437@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 19 Jul 2003, Pawel Jakub Dawidek wrote:

> Your choices are:
> - device,
> - sysctl,
> - syscall.

There are actually a few other more obscure ways to push information from
the kernel to userspace, depending on what you want to accomplish.

Write directly to a file from the kernel.  ktrace, system accounting, and
ktr with alq all stream data directly to
a file provided by an authorized user process.  quotas and UFS1
extended attribute data are also written directly to a file.  On
other operating systems, audit implementations frequently take the same
approach -- when the goal is long term storage of data in a
user-accessible
form, but you don't want to stream it through a user process live, this
is usually the preference.  Typically, when taking this approach, a
special system call is used to notify the kernel of the target file to
write to -- the file is created by the user process with appropriate
protections.  Often, but not always, the system call is non-blocking and
simply returns once the file is hooked up as a target, and continues
until another system call cancels delivery, or switches it to a new
target.

Stream it through a device node.  If you need only one or a small number
of processes to listen for events from the kernel, a common approach
is a pseudo-device that acts like a file.  For example, syslogd listens
on /dev/klog for log events from the kernel; some audit implementations
also take this approach.  Our devd, usbd, and others similarly listen
for system events that are exposed to user processes as data on a
blocking pseudo-device.  One nice thing about this approach is that you
can combine it with select(), kqueue(), et al, to do centralized event
management in the application.  BPF also does this.  Both Arla and
Coda take this approach for LPC'ing to userspace to request events
as a result of VFS operations by processes.

Expose it using a special socket type.  We expose routing data and
network stack administrative controls as special reads, writes, and
ioctls on various socket types.  I'm not a big fan of this approach,
as it special cases a lot of bits, and requires you to get caught
up in socket semantics.  However, one advantage of this approach is
it makes the notion of multicast of events to multiple listeners easier
to deal with, since each socket endpoint has automatic message buffering.

There are some other odd cases in use as well.  The NFS locking code
opens a specially named fifo (/var/run/lock) and writes messages to
it, which are picked up by rpc.lockd.  The lock daemon pushes events
back into the kernel using a special system call.  I don't really
like this approach, as it has some odd semantics -- especially since
it reopens the fifo for each operation, and there are credential/
file system namespace inconsistencies.

Of these approaches, my favorite are writing directly to a file, and using
a psuedo-device, depending on the requirements.  They have fairly
well-defined security semantics (especially if you properly cache the
open-time credentials in the file case).  I don't really like the Fifo
case as it has to re-look-up the fifo each time, and has some odd blocking
semantics.  Sockets, as I said, involve a lot of special casing, so unless
you're already dealing with network code, you probably don't want to drag
it into the mix.  If you're creating big new infrastructure for a feature,
I suppose you could also hook it up as a first class object at the file
descriptor level, in the style of kqueue.  If it's relatively minor event
data, you could hook up a new kqueue event type.  You could also just use
a special-purpose system call or sysctl if you don't mind a lot of context
switching and lack of buffering. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1030720125403.91635A-100000>