Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Oct 2003 01:44:17 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        andi payn <andi_payn@speedymail.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: kevent and related stuff
Message-ID:  <3FA22EF1.39B64387@mindspring.com>
References:  <1067529247.36829.2138.camel@verdammt.falcotronic.net>

next in thread | previous in thread | raw e-mail | index | archive | help
andi payn wrote:
> First, let me mention that I'm not nearly as experienced coding for *BSD
> as for linux, so I may ask some stupid questions.
> 
> I've been looking at the fam port, and this has brought up a whole slew
> of questions. I'm not sure if all of them are appropriate to this list,
> but I don't know who else to ask, so....
> 
> First, some background: On Irix and Linux, fam works by asking the
> kernel to send it a signal whenever the specified accesses occur. On
> FreeBSD, since there is no imon interface and no dnotify fcntl, it
> instead works by periodically stating all of the files it's
> watching--which is obviously not as good. The fam FAQ suggests that
> FreeBSD users should adapt fam to use the kevent interface.

Yes.  The "file access monitor" tool is the classic argument.


> I looked into kevent, and it seems like there are a number of problems
> that lead me to suspect that this is a really stupid idea. And yet, I'd
> assume that someone on the fam team at SGI and/or one of the outside fam
> developers would know FreeBSD at least as well as me. Therefore, I'm
> guessing I'm missing something here. So, any ideas anyone can offer
> would be very helpful.
> 
> So, here's the questions I have:
> 
> * I think (but I'm not sure) that kevent doesn't notify at all if the
> only change to a file is its ATIME. If I'm right, this makes kevent
> completely useless for fam. Adding a NOTE_ACCESS or something similar
> would fix this. Since I'm pretty new to FreeBSD: What process do I go
> through to figure out whether anyone else wants this, whether the
> interface I've come up with is acceptable, etc.? And, once I write the
> code, do I submit it as a pr?

You add it, submit it as a PR, if send-pr will work from your
machine properly, discuss it on the lists, and if someone with
a commit bit has the time and likes the idea, it will be committed.


> * The kevent mechanism doesn't monitor directories in a sufficient way
> to make fam happy. If you change a file in a directory that you're
> watching, unlike imon or dnotify, kevent won't see anything worth
> reporting at all. This means that for directory monitoring, kevent is
> useless as-is. Again, if I wanted to patch kevent to provide this
> additional notification, would others want this?

I'm not sure that this is correct, unless you are talking about
monitoring all files in a directory by merely monitoring the
directory.  If you make a modification to the file metadata (e.g.
add a link or rename it), then you will be notified that the
directory has changed.

The argument against subhierarchy monitoring is that it will, by
definition, stop at the directory level, and it can not be
successfully implemented for all FS types.


> * When two equivalent events appear in the queue, kevent aggregates
> them. This means that if there are two updates to a file since the last
> time you checked, you'll only see the most recent one. For some uses of
> fam (keeping a folder window up to date), this is what you want; for
> others (keeping track of how often a file is read), this is useless. The
> only solution I can think of is to add an additional flag, or some other
> way to specify that you want duplicated events.

This is the classic "edge triggered vs. level triggered" argument
that Linux people bring up every time someone suggest they implement
kqueue in Linux.

This is easily fixable: you seperate the flag from the data, adding
an additional argument to KNOTE().  This also has the side effect
of removing the restriction on the PID size, which is imposed by
the limited number of bits left over for representing the PID.

This is a trivial change, and I've done it several times.

The way this works is that you establish, via definition of the
udata argument, a contract between the kernel and the user space
over what the udata means.  The additional argument to KNOTE can
then be used by the per event note handling code in the kernel
to fill out a udata structure with as much data as you want to
give it, and to identify the place in user space to copy it out
to.

For example, you could set up an accept filter to accept up to 10
connections at a time, and return the fd's into the user space
structure's int [10] array and fill out the int count value with
how many were returned.

For your case, you could use it to copy out each and every event
instance, rather than aggregating the events.


> * Unlike imon and dnotify, kevent doesn't provide any kind of callback
> mechanism; instead, you have to poll the queue for events. Would it be
> useful to specify another flag/parameter that would tell the kernel to
> signal the monitoring process whenever an event is available? (It would
> certainly make the fam code easier to write--but if it's not useful
> anywhere else, that's probably not enough.)

You can SIGPOLL on the event descriptor returned by kqueue().  You
can use it in a select() or poll() call.  You can pass it to another
kqueue() as an EVFILT_READ event.

Snding signals ("callbacks") is probably the absolutely least
efficient way of getting the notification back.  The presumption
here (and it's likely a good one) is that, rather than polling,
your application will be event driven, and get the events by
blocking and waiting for them.


> * The kevent vnode stuff apparently only works on UFS. And it looks like
> it would be a major project to port it to other filesystems.

It's actually pretty trivial, so long as you know the FS you are
porting it to.  The notifications for many things should probably
be migrated to the VFS layer, instead (see Darwin's implementation).

> Would this be useful for anything other than improving fam?

Sure; it would be as useful for other FS's as it's useful in UFS
now.  The primary utility (IMO) is for GUI interfaces which want
to update their displays in as close to real time as possible.


> What about a port of
> the imon kernel interface (and/or the dnotify fcntl) to FreeBSD instead?

The way Linux people feel about "edge vs. level triggered" kqueue
is about the same way FreeBSD people feel about "dnotify"... but
there's no obvious way to fix the complaints about "dnotify".

imon is pretty useless; if it would be implemented at all, the way
to do it is in terms of kevents.  Either way, you are resolving the
kqueue "issue", so you might as well use kqueue.


> * The kqueue doesn't appear to have any maximum size. If this is true,
> the dnotify/fam problem where you get hideous errors from overflowing
> queues wouldn't be an issue, but you could instead end up wasting
> massive amounts of memory in the kernel if you didn't get around to
> reading the queue.... Which is it?

This is not strictly true; with the non-contract kqueue interface, it
will aggregate the notes on a per object basis, so it's very hard to
overflow (you'd have to monitor more things than you have memory
available to monitor).

With the contract kqueue, you would do one of two things:

1)	Block the code in the KNOTE() addition until there was
	room on the queue

2)	Have a ring buffer in user space as part of the contract,
	and if it overflows, it overflows

Both of these boild down to "what do I do when I'm getting more
events in the kernel than user space is able to process before
buffer exhaustion sets in?".

> Any answers, or pointers to where I can find these answers, would be
> greatly appreciated.

I don't pretend that my answers are authoritative; however, having
done what you want to do for two commercial companies now, I can
tell you the approach I describe (using a contract between the
kernel an user space, and separating the parameter from the flag
bits) will work.  8-).

In fact, if you check the -current archives from about two years
ago in June or July or so, you will see patches that perform this
separation, and which add support for System V IPC message queues
sending events, thereby allowing them to be selected/polled/etc.
via their kqueue descriptor.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FA22EF1.39B64387>