From owner-cvs-src@FreeBSD.ORG Wed Jul 14 19:29:54 2004 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 11DEB16A4CE; Wed, 14 Jul 2004 19:29:54 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 96AC243D5F; Wed, 14 Jul 2004 19:29:53 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i6EJTaNm059552; Wed, 14 Jul 2004 15:29:36 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i6EJTa8i059549; Wed, 14 Jul 2004 15:29:36 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Wed, 14 Jul 2004 15:29:35 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Alfred Perlstein In-Reply-To: <20040714191002.GE95729@elvis.mu.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@freebsd.org cc: src-committers@freebsd.org cc: cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/kern kern_event.c src/sys/sys eventvar.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2004 19:29:54 -0000 On Wed, 14 Jul 2004, Alfred Perlstein wrote: > Ok, I'll back it out. In the future I would appreciate a simple "we > actually have code in progress to do this" rather than what I thought > was wishful thinking. > > I know your brain is full right now, but is there any way you see this > actually working? I know how to avoid the recursion, but, buh.. I > think sigio needs an overhaul. I'll see if that works, for now I'll > just #if 0 the ioctl code to enable it. Sorry about that, I should have explained more fully in my original e-mail what the current state of the universe was. I'm not inherrently opposed to the notion of pgsigio getting launched relating to kqueue events, and I can see potential utility there, but I worry mostly about the immediate implementation consequences. I agree that the problem seems to be with pgsigio, and perhaps more generally with our notion of event management in the kernel. We have a lot of different event notification systems -- a perfect example of the level of suffering is sowakeup(), which I've slightly simplified here: selwakeuppri(&sb->sb_sel, PSOCK); sb->sb_flags &= ~SB_SEL; if (sb->sb_flags & SB_WAIT) { sb->sb_flags &= ~SB_WAIT; wakeup(&sb->sb_cc); } KNOTE(&sb->sb_sel.si_note, 0); if ((so->so_state & SS_ASYNC) && so->so_sigio != NULL) pgsigio(&so->so_sigio, SIGIO, 0); if (sb->sb_flags & SB_UPCALL) (*so->so_upcall)(so, so->so_upcallarg, M_DONTWAIT); if (sb->sb_flags & SB_AIO) aio_swake(so, sb); (This is a "See also: abstraction" sort of code quote) We send out notifications to select, socket buffer wakeups, kqueue, pgsigio, socket upcalls, and AIO. Each of these blithly assumes it's basically a "leaf subsystem" in that it's OK to call it at any point from any place that generates an event of interest. The problem you bumped into is one that I've already been running into with the socket upcalls: sometimes the so-called "leaf" subsystems go off and do a lot of stuff. For example, call back into the socket layer, send signals, etc. Kqueue is interesting in that, other than recursing back into the calling system to test the filter, it layers on top of relatively few other subsystems. By adding a layering above signaling, it ceases to become so very leafy, which makes it a lot harder to deal with. I like Doug's notion of using a task to handle some of the callbacks in a safe way. We're already using this notion to simplify locking in the Jail code, so as to avoid vrele() requiring Giant in the process context, and I'm using it now in SLIP to perform the slstart() from a context that can safely acquire Giant so it can enter the tty subsystem. Asynchrony, where sensible, can provide a clean solution to lock recursion/ordering issues. We use it for GEOM up/down threads, with ktrace, routing sockets, and so on. However, it has some limitations, in that it discards a lot of useful stack state for debugging, adds delays in notification (potentially somewhat unbounded) which can open up races itself, as well as other useful context. It could well be simply inserting a task queue here solves the problem, but we'll want to be careful. For example, in the SLIP code and Jail code, the 'struct task' used to perform the asynchronous notify is stored in the softc (or struct prison, which is basically a Jail softc). In the Jail case, we know that the container prison can't be free()'d, because that's what the asynchronous notify does. However, in the SLIP case I currently have no way to guarantee that the softc won't be free'd between the event being scheduled and running. Making sure this problem is cleanly addressed is going to be important in situations like pgsigio(), which involve pretty transient entities... FYI, here's my current thinking on kqueue locking -- I'm currently picking up Brian's pieces to update them to recent CURRENT and make sure they work well in my test environment. From my perspective, I'd prefer a data-based approach, but John-Mark has not yet been able to shake free his locking. If he does quickly, maybe we go with that, but Brian's code appears to largely work now, and is a strict dependency to run the network stack Giant-free out of the box. As it stands, if you run the socket code without Giant today, moderate kqueue use causes a panic -- very similar to the panic you get if you use kqueues with make world today. As such, we need to get something done soon so we can get more testing exposure of the network stack without Giant. I'd like to get something, be it Brian's or John-Mark's code, merged in the next two weeks. The reason I pick two weeks is that in two weeks, I'm going on vacation to India for two weeks, so the sooner it's done, the more is ready before I go away. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research