From owner-freebsd-arch@FreeBSD.ORG Wed Aug 17 03:20:19 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADE7616A41F; Wed, 17 Aug 2005 03:20:19 +0000 (GMT) (envelope-from julian@elischer.org) Received: from delight.idiom.com (delight.idiom.com [216.240.32.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6A30B43D48; Wed, 17 Aug 2005 03:20:19 +0000 (GMT) (envelope-from julian@elischer.org) Received: from idiom.com (idiom.com [216.240.32.1]) by delight.idiom.com (Postfix) with ESMTP id 4A6E4208C8A; Tue, 16 Aug 2005 20:20:19 -0700 (PDT) Received: from [192.168.2.2] (home.elischer.org [216.240.48.38]) by idiom.com (8.12.11/8.12.11) with ESMTP id j7H3KH5E040622; Tue, 16 Aug 2005 20:20:18 -0700 (PDT) (envelope-from julian@elischer.org) Message-ID: <4302ACF1.6050209@elischer.org> Date: Tue, 16 Aug 2005 20:20:17 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050424 X-Accept-Language: en, hu MIME-Version: 1.0 To: Max Laier References: <20050816170519.A74422@xorpc.icir.org> <200508170435.34688.max@love2party.net> In-Reply-To: <200508170435.34688.max@love2party.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: net@freebsd.org, arch@freebsd.org Subject: Re: duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2005 03:20:19 -0000 Max Laier wrote: > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote: > >>[apologies for the cross post but it belongs both to arch and net.] >> >>I notice that net/pfil.c and netinet/ip_fw2.c have two copies of >>aisimilar but slightly different implementation of >>multiple-reader/single-writer locks, which brings up the question(s): >> >>1. should we rather put this code in the generic kernel code so that other >> subsystems could make use of it ? E.g. the routing table is certainly >> a candidate, > > > I have asked this several time on -arch and IRC, but never found anyone > willing to pursue it. However, the problem is ... > > >>and especially >> >>2. should we implement it right ? >> >> Both implementations are subject to starvation for the writers >> (which is indeed a problem here, because we might want to modify >> a ruleset and be prevented from doing it because of incoming traffic >> that keeps readers active). >> Also the PFIL_TRY_WLOCK will in fact be blocking if a writer >> is already in - i have no idea how problematic is this in the >> way it is actually used. > > > ... really this. I didn't find a clean way out of the starvation issue. What > I do for pfil is that I set a flag and simply stop serving[2] shared requests > once a writer waits for the lock. If a writer can't sleep[1] then we return > EBUSY and don't. However, for pfil it's almost ever safe to assume that a > write may sleep (as it is for most instances of this kind of sx-lock where > you have BIGNUMxreads:1xwrite). > > [1] Note that there is a *big* difference between blocking and sleeping. > These two are usually confused. While it is almost always okay to block it > is seldom okay to sleep. The existing sx(9) api has the problem that it > *sleeps* in the shared path which renders it unusable for this usecase (as we > might be holding other locks and must not sleep in the shared path). > However, sleeping in the shared path is one (?the only?) way out of the > starvation problem - other than a problem specific as done for pfil. > > [2] See pfil(9) BUGS. netgraph has yet another implementation of R/W locks. It relies on the fact that every lock action is done on behalf of a command request or a data processing request, each of which is queueable, and each RW lock is associated with a queue. Instead of blocking, the item is queued instead for later processing. >