Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Jul 2004 15:29:35 -0400 (EDT)
From:      Robert Watson <rwatson@freebsd.org>
To:        Alfred Perlstein <alfred@freebsd.org>
Cc:        cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/kern kern_event.c src/sys/sys eventvar.h
Message-ID:  <Pine.NEB.3.96L.1040714151328.56002D-100000@fledge.watson.org>
In-Reply-To: <20040714191002.GE95729@elvis.mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 14 Jul 2004, Alfred Perlstein wrote:

> Ok, I'll back it out.  In the future I would appreciate a simple "we
> actually have code in progress to do this" rather than what I thought
> was wishful thinking. 
> 
> I know your brain is full right now, but is there any way you see this
> actually working?  I know how to avoid the recursion, but, buh..  I
> think sigio needs an overhaul.  I'll see if that works, for now I'll
> just #if 0 the ioctl code to enable it. 

Sorry about that, I should have explained more fully in my original e-mail
what the current state of the universe was.  I'm not inherrently opposed
to the notion of pgsigio getting launched relating to kqueue events, and I
can see potential utility there, but I worry mostly about the immediate
implementation consequences.

I agree that the problem seems to be with pgsigio, and perhaps more
generally with our notion of event management in the kernel.  We have a
lot of different event notification systems -- a perfect example of the
level of suffering is sowakeup(), which I've slightly simplified here: 

        selwakeuppri(&sb->sb_sel, PSOCK);
        sb->sb_flags &= ~SB_SEL;
        if (sb->sb_flags & SB_WAIT) {
                sb->sb_flags &= ~SB_WAIT; 
                wakeup(&sb->sb_cc);
        }
        KNOTE(&sb->sb_sel.si_note, 0);
        if ((so->so_state & SS_ASYNC) && so->so_sigio != NULL)
                pgsigio(&so->so_sigio, SIGIO, 0);
        if (sb->sb_flags & SB_UPCALL)
                (*so->so_upcall)(so, so->so_upcallarg, M_DONTWAIT);
        if (sb->sb_flags & SB_AIO)
                aio_swake(so, sb);

(This is a "See also: abstraction" sort of code quote)

We send out notifications to select, socket buffer wakeups, kqueue,
pgsigio, socket upcalls, and AIO.  Each of these blithly assumes it's
basically a "leaf subsystem" in that it's OK to call it at any point from
any place that generates an event of interest.  The problem you bumped
into is one that I've already been running into with the socket upcalls:
sometimes the so-called "leaf" subsystems go off and do a lot of stuff.
For example, call back into the socket layer, send signals, etc.  Kqueue
is interesting in that, other than recursing back into the calling system
to test the filter, it layers on top of relatively few other subsystems.
By adding a layering above signaling, it ceases to become so very leafy,
which makes it a lot harder to deal with.

I like Doug's notion of using a task to handle some of the callbacks in a
safe way.  We're already using this notion to simplify locking in the Jail
code, so as to avoid vrele() requiring Giant in the process context, and
I'm using it now in SLIP to perform the slstart() from a context that can
safely acquire Giant so it can enter the tty subsystem.  Asynchrony, where
sensible, can provide a clean solution to lock recursion/ordering issues. 
We use it for GEOM up/down threads, with ktrace, routing sockets, and so
on.

However, it has some limitations, in that it discards a lot of useful
stack state for debugging, adds delays in notification (potentially
somewhat unbounded) which can open up races itself, as well as other
useful context.  It could well be simply inserting a task queue here
solves the problem, but we'll want to be careful.  For example, in the
SLIP code and Jail code, the 'struct task' used to perform the
asynchronous notify is stored in the softc (or struct prison, which is
basically a Jail softc).  In the Jail case, we know that the container
prison can't be free()'d, because that's what the asynchronous notify
does.  However, in the SLIP case I currently have no way to guarantee that
the softc won't be free'd between the event being scheduled and running. 
Making sure this problem is cleanly addressed is going to be important in
situations like pgsigio(), which involve pretty transient entities... 

FYI, here's my current thinking on kqueue locking -- I'm currently picking
up Brian's pieces to update them to recent CURRENT and make sure they work
well in my test environment.  From my perspective, I'd prefer a data-based
approach, but John-Mark has not yet been able to shake free his locking. 
If he does quickly, maybe we go with that, but Brian's code appears to
largely work now, and is a strict dependency to run the network stack
Giant-free out of the box.  As it stands, if you run the socket code
without Giant today, moderate kqueue use causes a panic -- very similar to
the panic you get if you use kqueues with make world today.  As such, we
need to get something done soon so we can get more testing exposure of the
network stack without Giant.  I'd like to get something, be it Brian's or
John-Mark's code, merged in the next two weeks.  The reason I pick two
weeks is that in two weeks, I'm going on vacation to India for two weeks,
so the sooner it's done, the more is ready before I go away.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040714151328.56002D-100000>