From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 25 18:38:41 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7368416A4B3
	for <hackers@freebsd.org>; Sat, 25 Oct 2003 18:38:41 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8249943FB1
	for <hackers@freebsd.org>; Sat, 25 Oct 2003 18:38:40 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id h9Q1bTMg084294;
	Sat, 25 Oct 2003 21:37:33 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)h9Q1bSEQ084291;
	Sat, 25 Oct 2003 21:37:29 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Sat, 25 Oct 2003 21:37:28 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Matthew Dillon <dillon@apollo.backplane.com>
In-Reply-To: <200310252213.h9PMDCHq032546@apollo.backplane.com>
Message-ID: <Pine.NEB.3.96L.1031025211002.83249C-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: John-Mark Gurney <gurney_j@efn.org>
cc: hackers@freebsd.org
cc: Kip Macy <kmacy@fsmware.com>
cc: Marcel Moolenaar <marcel@xcllnt.net>
Subject: Re: FreeBSD mail list etiquette
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2003 01:38:41 -0000


On Sat, 25 Oct 2003, Matthew Dillon wrote:

>     It's a lot easier lockup path then the direction 5.x is going, and
>     a whole lot more maintainable IMHO because most of the coding doesn't
>     have to worry about mutexes or LORs or anything like that.  

You still have to be pretty careful, though, with relying on implicit
synchronization, because while it works well deep in a subsystem, it can
break down on subsystem boundaries.  One of the challenges I've been
bumping into recently when working with Darwin has been the split between
their Giant kernel lock, and their network lock.  To give a high level
summary of the architecture, basically they have two Funnels, which behave
similarly to the Giant lock in -STABLE/-CURRENT: when you block, the lock
is released, allowing other threads to enter the kernel, and regained when
the thread starts to execute again. They then have fine-grained locking
for the Mach-derived components, such as memory allocation, VM, et al. 

Deep in a particular subsystem -- say, the network stack, all works fine. 
The problem is at the boundaries, where structures are shared between
multiple compartments.  I.e., process credentials are referenced by both
"halves"  of the Darwin BSD kernel code, and are insufficiently protected
in the current implementation (they have a write lock, but no read lock,
so it looks like it should be possible to get stale references with
pointers accessed in a read form under two different locks). Similarly,
there's the potential for serious problems at the surprisingly frequently
occuring boundaries between the network subsystem and remainder of the
kernel: file descriptor related code, fifos, BPF, et al.  By making use of
two large subsystem locks, they do simplify locking inside the subsystem,
but it's based on a web of implicit assumptions and boundary
synchronization that carries most of the risks of explicit locking.

It's also worth noting that there have been some serious bugs associated
with a lack of explicit synchronization in the non-concurrent kernel model
used in RELENG_4 (and a host of other early UNIX systems relying on a
single kernel lock).  These have to do with unexpected blocking deep in a
function call stack, where it's not anticipated by a developer writing
source code higher in the stack, resulting in race conditions.  In the
past, there have been a number of exploitable security vulnerabilities due
to races opened up in low memory conditions, during paging, etc.  One
solution I was exploring was using the compiler to help track the
potential for functions to block, similar to the const qualifier, combined
with blocking/non-blocking assertions evaluated at compile-time.  However,
some of our current APIs (M_NOWAIT, M_WAITOK, et al) make that approach
somewhat difficult to apply, and would have to be revised to use a
compiler solution.  These potential weaknesses very much exist in an
explicit model, but with explicit locking, we have a clearer notion of how
to express assertions.

In -CURRENT, we make use of thread-based serialization in a number of
places to avoid explicit synchronization costs (such as in GEOM for
processing work queues), and we should make more use of this practice. 
I'm particularly interested in the use of interface interrupt threads
performing direct dispatch as a means to maintain interface ordering of
packets coming in network interfaces while allowing parallelism in network
processing (you'll find this in use in Sam's netperf branch currently).

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories