From owner-freebsd-arch@FreeBSD.ORG Sun Sep 25 00:22:15 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D0CB16A41F for ; Sun, 25 Sep 2005 00:22:15 +0000 (GMT) (envelope-from thompsa@freebsd.org) Received: from heff.fud.org.nz (60-234-149-201.bitstream.orcon.net.nz [60.234.149.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A33F43D49 for ; Sun, 25 Sep 2005 00:22:14 +0000 (GMT) (envelope-from thompsa@freebsd.org) Received: by heff.fud.org.nz (Postfix, from userid 1001) id BFDA81CCD4; Sun, 25 Sep 2005 12:22:12 +1200 (NZST) Date: Sun, 25 Sep 2005 12:22:12 +1200 From: Andrew Thompson To: Peter Jeremy Message-ID: <20050925002212.GA77857@heff.fud.org.nz> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050924192237.GP40237@cirb503493.alcatel.com.au> User-Agent: Mutt/1.4.2.1i Cc: freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Sep 2005 00:22:15 -0000 On Sun, Sep 25, 2005 at 05:22:38AM +1000, Peter Jeremy wrote: > On Sat, 2005-Sep-24 15:25:06 +0200, Max Laier wrote: > >for some time now, we have three bridge implementations in the tree: > > - net/bridge.c - the "old" bridge > > - net/if_bridge.c - the "new" bridge from Net/OpenBSD > > - netgraph/ng_bridge.c - the netgraph version [1] > > > >The new code has several advantages over the old version: > > - Spanning Tree Protocol (802.1D) > > - better firewall support (IPv6, stateful filtering, ...) > > - easy ifconfig(8) configuration > > Since I've recently needed it, neither bridge.c nor if_bridge.c allow > you to bridge VLAN trunks (you can bridge individual VLANs but that > becomes unwieldly when you have dozens of VLANs). I have code to do > this in bridge.c. I'd like to see what you have done here, can I look at the patch. > > >Please test the new alternative if you are using the old one still. > > Has anyone looked at how difficult it would be to get if_bridge.c to > work in 5.x? http://people.freebsd.org/~thompsa/if_bridge-5stable.20050907.diff Ive posted it for testing before but didnt get a response, care to try it out? Andrew From owner-freebsd-arch@FreeBSD.ORG Mon Sep 26 08:24:38 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADEE416A41F for ; Mon, 26 Sep 2005 08:24:38 +0000 (GMT) (envelope-from maxim@macomnet.ru) Received: from mp2.macomnet.net (mp2.macomnet.net [195.128.64.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B2CA43D49 for ; Mon, 26 Sep 2005 08:24:37 +0000 (GMT) (envelope-from maxim@macomnet.ru) Received: from localhost (localhost [127.0.0.1]) by mp2.macomnet.net (8.13.3/8.13.3) with ESMTP id j8Q8OZKa009177; Mon, 26 Sep 2005 12:24:36 +0400 (MSD) (envelope-from maxim@macomnet.ru) Date: Mon, 26 Sep 2005 12:24:35 +0400 (MSD) From: Maxim Konovalov To: Max Laier In-Reply-To: <43357207.7080405@samsco.org> Message-ID: <20050926122203.A7055@mp2.macomnet.net> References: <200509241525.16173.max@love2party.net> <43357207.7080405@samsco.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Sep 2005 08:24:38 -0000 [...] > I'm fine with it being removed in HEAD. You should change the docs and > whatever appropriate manpages in 6-STABLE to clearly indicate that it is ... and please don't forge about GNATS. There are several bridge.c specific PRs (kern/57100, kern/82919 etc). -- Maxim Konovalov From owner-freebsd-arch@FreeBSD.ORG Mon Sep 26 21:24:52 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 791A116A429 for ; Mon, 26 Sep 2005 21:24:52 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id EFDDF43D48 for ; Mon, 26 Sep 2005 21:24:49 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.41.233] (Not Verified[10.50.41.233]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Mon, 26 Sep 2005 17:40:47 -0400 From: John Baldwin To: arch@FreeBSD.org Date: Mon, 26 Sep 2005 17:26:03 -0400 User-Agent: KMail/1.8 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200509261726.04372.jhb@FreeBSD.org> Cc: Subject: atomic_fetchadd_int() and a simple refcount API for non-complicated refcounts X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Sep 2005 21:24:52 -0000 I have two patches here for review. They have been tested on i386, alpha, and sparc64 so far. The first patch adds a new atomic operation: atomic_fetchadd_int() (and its alias: atomic_fetchadd_32()). The operation pretty much maps directly onto the x86 xadd instruction added with the 486. I believe it also maps directly onto the fetchadd instruction on ia64 (from whence it got its name). (Note that the ia64 instruction can only take a fixed set of immediate operands to add by such as 1, -1, 2, -2, 4, -4, etc. up to 64 and -64 IIRC.) On architectures where I wasn't sure how to do the inline assembly (ia64, ppc, arm) it just uses a atomic_cmpset() loop, but that can be changed later as an optimization easily enough. Having this new primitive allows for construction of simple standalone reference counts that are cheaper than ones protected by a mutex (1 atomic op vs at least 2). Thus, I used atomic_fetchadd() to build a very simple refcount API that operates on integers. It has an init method to set the initial value, an acquire method to bump a refcount, and a release method to drop a reference. The release method returns non-zero when the last reference is dropped. I used this API to implement the reference counts for ucreds, plimits, pargs, and mbuf clusters (though I know andre@ has plans in the works to hack on the mbuf ones further, so those changes might just be temporary). Also, I know that andre@ wants atomic_fetchadd() for the changes he will be doing. Patches are at the URLS below: http://www.FreeBSD.org/~jhb/patches/atomic_fetchadd.patch http://www.FreeBSD.org/~jhb/patches/refcount_cvs.patch FAQ: Q: The refcount API is too simple and doesn't handle the snortzel-foo edge case!!!! A: Yes. It's not meant to be all-singing and all-dancing. It is simply available as one tool that is available for use. More complex reference counting schemes can be built using atomic_fetchadd() or other more complex primitives such as mutexes if needed. Q: I think the name 'fetchadd' sucks!! A: Your opinion has been noted. Q: Will this destroy the ABI? A: No. The refcount_*() functions are all inlines. The only thing that might break the ABI might be changes to structures to remove mutexes or mutex pointers, but that part of a change can always be left out if this were merged across a branch. Q: Will this be in 6.0? A: Hopefully. The sooner this gets reviewed the better the chances. Due to the previous question, it is possible to merge it to RELENG_6 after 6.0 is released anyway. Q: What color is a snortzel-foo? A: I don't know, probably some sort of lavender. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Mon Sep 26 21:47:37 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5910416A425 for ; Mon, 26 Sep 2005 21:47:37 +0000 (GMT) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3817543D5A for ; Mon, 26 Sep 2005 21:47:34 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 66457 invoked from network); 26 Sep 2005 21:20:33 -0000 Received: from unknown (HELO freebsd.org) ([62.48.0.54]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 26 Sep 2005 21:20:33 -0000 Message-ID: <43386C78.D0CC2B9@freebsd.org> Date: Mon, 26 Sep 2005 23:47:36 +0200 From: Andre Oppermann X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: John Baldwin References: <200509261726.04372.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org Subject: Re: atomic_fetchadd_int() and a simple refcount API for non-complicatedrefcounts X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Sep 2005 21:47:37 -0000 John Baldwin wrote: > > I have two patches here for review. They have been tested on i386, alpha, and > sparc64 so far. The first patch adds a new atomic operation: > atomic_fetchadd_int() (and its alias: atomic_fetchadd_32()). The operation > pretty much maps directly onto the x86 xadd instruction added with the 486. > I believe it also maps directly onto the fetchadd instruction on ia64 (from > whence it got its name). (Note that the ia64 instruction can only take a > fixed set of immediate operands to add by such as 1, -1, 2, -2, 4, -4, etc. > up to 64 and -64 IIRC.) On architectures where I wasn't sure how to do the > inline assembly (ia64, ppc, arm) it just uses a atomic_cmpset() loop, but > that can be changed later as an optimization easily enough. Having this new > primitive allows for construction of simple standalone reference counts that > are cheaper than ones protected by a mutex (1 atomic op vs at least 2). > Thus, I used atomic_fetchadd() to build a very simple refcount API that > operates on integers. It has an init method to set the initial value, an > acquire method to bump a refcount, and a release method to drop a reference. > The release method returns non-zero when the last reference is dropped. I > used this API to implement the reference counts for ucreds, plimits, pargs, > and mbuf clusters (though I know andre@ has plans in the works to hack on the > mbuf ones further, so those changes might just be temporary). Also, I know > that andre@ wants atomic_fetchadd() for the changes he will be doing. > Patches are at the URLS below: > > http://www.FreeBSD.org/~jhb/patches/atomic_fetchadd.patch > http://www.FreeBSD.org/~jhb/patches/refcount_cvs.patch I have the i386 and amd64 versions of atomic_fetchadd running in my tree for about a week. Works fine so far. ;-) Please don't convert the mbuf cluster refcount to the refcount API. I can do my commit reworking mbuf ref counting right after you commit adding the new atomic_fetchadd function. That way we can save one step and I have less trouble re-integrating my tree with HEAD. -- Andre > FAQ: > > Q: The refcount API is too simple and doesn't handle the snortzel-foo > edge case!!!! > > A: Yes. It's not meant to be all-singing and all-dancing. It is simply > available as one tool that is available for use. More complex reference > counting schemes can be built using atomic_fetchadd() or other more complex > primitives such as mutexes if needed. > > Q: I think the name 'fetchadd' sucks!! > > A: Your opinion has been noted. > > Q: Will this destroy the ABI? > > A: No. The refcount_*() functions are all inlines. The only thing that might > break the ABI might be changes to structures to remove mutexes or mutex > pointers, but that part of a change can always be left out if this were > merged across a branch. > > Q: Will this be in 6.0? > > A: Hopefully. The sooner this gets reviewed the better the chances. Due to > the previous question, it is possible to merge it to RELENG_6 after 6.0 is > released anyway. > > Q: What color is a snortzel-foo? > > A: I don't know, probably some sort of lavender. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Tue Sep 27 12:31:26 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3756216A41F; Tue, 27 Sep 2005 12:31:26 +0000 (GMT) (envelope-from is@rambler-co.ru) Received: from yam.park.rambler.ru (yam.park.rambler.ru [81.19.64.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id 64CBA43D5A; Tue, 27 Sep 2005 12:31:24 +0000 (GMT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by yam.park.rambler.ru (8.13.3/8.13.3) with ESMTP id j8RCVMhF063409; Tue, 27 Sep 2005 16:31:22 +0400 (MSD) (envelope-from is@rambler-co.ru) Date: Tue, 27 Sep 2005 16:31:22 +0400 (MSD) From: Igor Sysoev X-X-Sender: is@is.park.rambler.ru To: Xin LI In-Reply-To: <1127101042.788.30.camel@spirit> Message-ID: <20050927162014.M65594@is.park.rambler.ru> References: <1127101042.788.30.camel@spirit> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Combine more operation within one system call: to do it, or not to do it? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Sep 2005 12:31:26 -0000 On Mon, 19 Sep 2005, Xin LI wrote: > It seems that Microsoft has recently revised several of their APIs. One > example is their ConnectEx(), as found in documentation [1]. The > implementation is not so complex that it just combines more operation > within one system call, however, this can reduce some unnecessary > context switches as it's now possible to do more things within one > system call. (For instance, when you connect to a server, you usually > want to send some data as request). > > Shall we do something similar? Or do we already done something similar? > > [1] > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/connectex_2.asp Actually, ConnectEx() was implemented to allow to post an asynchronous connect operations to the i/o completion ports. There is no API to implement high perfomance proxy in Windows NT and 2000. You have to use non-blocking connect() and then pass the socket to special helper threads that wait in WSAWaitForMultipleEvents() when the sockets would be connected. Each thread could handle up to 63 sockets only. Igor Sysoev http://sysoev.ru/en/ From owner-freebsd-arch@FreeBSD.ORG Tue Sep 27 14:34:18 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 383E016A420 for ; Tue, 27 Sep 2005 14:34:18 +0000 (GMT) (envelope-from bucht@acc.umu.se) Received: from khan.acc.umu.se (khan.acc.umu.se [130.239.18.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id C96D743D48 for ; Tue, 27 Sep 2005 14:34:17 +0000 (GMT) (envelope-from bucht@acc.umu.se) Received: from localhost (localhost [127.0.0.1]) by amavisd-new (Postfix) with ESMTP id 5A184D21A for ; Tue, 27 Sep 2005 16:34:15 +0200 (MEST) Received: from shaka.acc.umu.se (shaka.acc.umu.se [130.239.18.148]) by khan.acc.umu.se (Postfix) with ESMTP id C7CACD222 for ; Tue, 27 Sep 2005 16:34:03 +0200 (MEST) Received: by shaka.acc.umu.se (Postfix, from userid 23835) id 8B2CE17213; Tue, 27 Sep 2005 16:34:03 +0200 (MEST) Date: Tue, 27 Sep 2005 16:34:03 +0200 From: Johan Bucht To: arch@FreeBSD.org Message-ID: <20050927143402.GA7093@shaka.acc.umu.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new at acc.umu.se Cc: Subject: Re: atomic_fetchadd_int() and a simple refcount API for non-complicatedrefcounts X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Sep 2005 14:34:18 -0000 Where do I order a plush snortzel-foo? =) -- /Johan Bucht bucht@acc.umu.se From owner-freebsd-arch@FreeBSD.ORG Tue Sep 27 14:44:47 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 44E0A16A41F for ; Tue, 27 Sep 2005 14:44:47 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E24D43D70 for ; Tue, 27 Sep 2005 14:44:46 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 0D0C2BC6D; Tue, 27 Sep 2005 14:44:44 +0000 (UTC) To: Johan Bucht From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 27 Sep 2005 16:34:03 +0200." <20050927143402.GA7093@shaka.acc.umu.se> Date: Tue, 27 Sep 2005 16:44:43 +0200 Message-ID: <92873.1127832283@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org Subject: Re: atomic_fetchadd_int() and a simple refcount API for non-complicatedrefcounts X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Sep 2005 14:44:47 -0000 In message <20050927143402.GA7093@shaka.acc.umu.se>, Johan Bucht writes: > >Where do I order a plush snortzel-foo? =) If you have to ask, you're not old enough to have one :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 10:21:59 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A06616A41F for ; Wed, 28 Sep 2005 10:21:59 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 031F743D4C for ; Wed, 28 Sep 2005 10:21:57 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.3/8.13.3) with ESMTP id j8SALtcN090756; Wed, 28 Sep 2005 14:21:55 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.3/8.13.3/Submit) id j8SALs4C090753; Wed, 28 Sep 2005 14:21:54 +0400 (MSD) (envelope-from yar) Date: Wed, 28 Sep 2005 14:21:53 +0400 From: Yar Tikhiy To: Peter Jeremy Message-ID: <20050928102153.GA86457@comp.chem.msu.su> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050924192237.GP40237@cirb503493.alcatel.com.au> User-Agent: Mutt/1.5.9i Cc: Max Laier , freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 10:21:59 -0000 On Sun, Sep 25, 2005 at 05:22:38AM +1000, Peter Jeremy wrote: > > Since I've recently needed it, neither bridge.c nor if_bridge.c allow > you to bridge VLAN trunks (you can bridge individual VLANs but that > becomes unwieldly when you have dozens of VLANs). I have code to do > this in bridge.c. Couldn't you bridge across the parent, or trunk, physical interfaces carrying tagged VLAN traffic then? (Of course, hardware support for VLAN should be turned off on them in that case.) -- Yar From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 10:29:49 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 10FB816A41F for ; Wed, 28 Sep 2005 10:29:49 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3309F43D5A for ; Wed, 28 Sep 2005 10:29:48 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j8SATXab018164; Wed, 28 Sep 2005 03:29:33 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j8SATXM9018163; Wed, 28 Sep 2005 03:29:33 -0700 (PDT) (envelope-from rizzo) Date: Wed, 28 Sep 2005 03:29:33 -0700 From: Luigi Rizzo To: Yar Tikhiy Message-ID: <20050928032933.G16027@xorpc.icir.org> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20050928102153.GA86457@comp.chem.msu.su>; from yar@comp.chem.msu.su on Wed, Sep 28, 2005 at 02:21:53PM +0400 Cc: Peter Jeremy , freebsd-arch@freebsd.org, Max Laier Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 10:29:49 -0000 On Wed, Sep 28, 2005 at 02:21:53PM +0400, Yar Tikhiy wrote: > On Sun, Sep 25, 2005 at 05:22:38AM +1000, Peter Jeremy wrote: > > > > Since I've recently needed it, neither bridge.c nor if_bridge.c allow > > you to bridge VLAN trunks (you can bridge individual VLANs but that > > becomes unwieldly when you have dozens of VLANs). I have code to do > > this in bridge.c. > > Couldn't you bridge across the parent, or trunk, physical interfaces > carrying tagged VLAN traffic then? (Of course, hardware support for > VLAN should be turned off on them in that case.) yes in fact i was wondering what's wrong with that because we have been using bridge.c like this for ages now... cheers luigi > -- > Yar > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 16:48:06 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA0F616A41F for ; Wed, 28 Sep 2005 16:48:06 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1578E43D48 for ; Wed, 28 Sep 2005 16:48:05 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.3/8.13.3) with ESMTP id j8SGm47L013702 for ; Wed, 28 Sep 2005 20:48:04 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.3/8.13.3/Submit) id j8SGm3SD013701 for arch@freebsd.org; Wed, 28 Sep 2005 20:48:03 +0400 (MSD) (envelope-from yar) Date: Wed, 28 Sep 2005 20:48:03 +0400 From: Yar Tikhiy To: arch@freebsd.org Message-ID: <20050928164803.GA11556@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i Cc: Subject: Minor issues in our rcNG X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 16:48:07 -0000 Hi there, The larger issue I'd like to discuss is as follows. Presently, ${_program} variables are special in that they always override ${command}. Some rc.d scripts (ab)use this to skip setting ${command}, e.g., sshd. Some other scripts (ab)use ${_program} despite they are not just starting ${command} once, e.g., pf, and so make rc.subr think they have functionality that isn't realy there, such as poll. Perhaps it's time to separate these two cases in a way? I can see two possible approaches. One is to require scripts just starting a daemon always set ${command}, and to apply the attached patch to rc.subr. With the patch, ${command} is overridden only if it was set in the first place. The other approach is to prevent non-daemon scripts from abusing ${_program}. However, in the latter case defining ${_program}, e.g., by mistake, still can affect the script while it should not. The second issue is rather small. The rc command "reload" is supported by code in /etc/rc.subr, but it doesn't appear on the default list of commands unlike "status" or "poll" when there is ${pidfile} or ${command} set, and so it is unusable by default. Scripts have to use ``extra_commands=reload''. In addition, all this is undocumented. Should "reload" be added to the list of available commands along with "status" and "poll"? Finally, we have a script named rcconf.sh which doesn't seem to do anything useful now since all the other scripts invoke load_rc_config by their own. Can we drop it then? -- Yar --- rc.subr~ 2005/08/24 16:37:28 +++ rc.subr 2005/09/20 00:03:41 @@ -493,9 +493,7 @@ esac eval _overide_command=\$${name}_program - if [ -n "$_overide_command" ]; then - command=$_overide_command - fi + command=${command:+${_overide_command:-$command}} _keywords="start stop restart rcvar $extra_commands" rc_pid= From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 18:48:12 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16F7316A41F for ; Wed, 28 Sep 2005 18:48:12 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C89A43D4C for ; Wed, 28 Sep 2005 18:48:11 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c220-239-19-236.belrs4.nsw.optusnet.com.au [220.239.19.236]) by mail10.syd.optusnet.com.au (8.12.11/8.12.11) with ESMTP id j8SIlXgV002963 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 29 Sep 2005 04:47:34 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.10/8.12.10) with ESMTP id j8SIlXSR073117; Thu, 29 Sep 2005 04:47:33 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost) by cirb503493.alcatel.com.au (8.12.10/8.12.9/Submit) id j8SIlWNV073116; Thu, 29 Sep 2005 04:47:32 +1000 (EST) (envelope-from pjeremy) Date: Thu, 29 Sep 2005 04:47:32 +1000 From: Peter Jeremy To: Luigi Rizzo Message-ID: <20050928184731.GA72352@cirb503493.alcatel.com.au> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050928032933.G16027@xorpc.icir.org> User-Agent: Mutt/1.4.2.1i Cc: Yar Tikhiy , freebsd-arch@freebsd.org, Max Laier Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 18:48:12 -0000 On Wed, 2005-Sep-28 03:29:33 -0700, Luigi Rizzo wrote: >On Wed, Sep 28, 2005 at 02:21:53PM +0400, Yar Tikhiy wrote: >> On Sun, Sep 25, 2005 at 05:22:38AM +1000, Peter Jeremy wrote: >> > >> > Since I've recently needed it, neither bridge.c nor if_bridge.c allow >> > you to bridge VLAN trunks (you can bridge individual VLANs but that >> > becomes unwieldly when you have dozens of VLANs). I have code to do >> > this in bridge.c. >> >> Couldn't you bridge across the parent, or trunk, physical interfaces >> carrying tagged VLAN traffic then? (Of course, hardware support for >> VLAN should be turned off on them in that case.) That's actually what I was trying to do. >yes in fact i was wondering what's wrong with that because >we have been using bridge.c like this for ages now... The problem is that the current bridge code only considers the MAC address for forwarding. When VLANs are in use, this is incorrect as both the MAC address and VLAN tag must be considered. The difference is crucial when you have the same MAC address appearing in multiple VLANs. This can occur when using DECnet Phase IV or Solaris with Cassini NICs - both of which have a per-host MAC address rather than a per-NIC MAC address. As an example, consider a system with a host-based MAC address that has two NICs. One NIC attaches to VLAN 123 on switch a, the other attaches to VLAN 124 on switch b [this is the situation we have in our test lab]. If I then attempt to join trunks from both switches using bridge(4), it sees the same MAC address on both bridged interfaces and shuts down. In reality, this situation is safe because the MAC addresses are in different VLANs. -- Peter Jeremy From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 23:17:09 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 659C116A41F for ; Wed, 28 Sep 2005 23:17:09 +0000 (GMT) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: from digger1.defence.gov.au (digger1.defence.gov.au [203.5.217.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 18C1443D5F for ; Wed, 28 Sep 2005 23:16:56 +0000 (GMT) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: from ednmsw501.dsto.defence.gov.au (ednmsw501.dsto.defence.gov.au [131.185.2.150]) by digger1.defence.gov.au with ESMTP id j8SNEvRl003760 for ; Thu, 29 Sep 2005 08:44:57 +0930 (CST) Received: from muttley.dsto.defence.gov.au (unverified) by ednmsw501.dsto.defence.gov.au (Content Technologies SMTPRS 4.3.17) with ESMTP id ; Thu, 29 Sep 2005 08:46:47 +0930 Received: from ednex501.dsto.defence.gov.au (ednex501.dsto.defence.gov.au [131.185.2.81]) by muttley.dsto.defence.gov.au (8.11.3/8.11.3) with ESMTP id j8SNDm018882; Thu, 29 Sep 2005 08:43:48 +0930 (CST) Received: from squash.dsto.defence.gov.au ([131.185.40.212]) by ednex501.dsto.defence.gov.au with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SJZBS2AB; Thu, 29 Sep 2005 08:43:46 +0930 Received: from squash.dsto.defence.gov.au (localhost [127.0.0.1]) by squash.dsto.defence.gov.au (8.13.3/8.13.3) with ESMTP id j8SNECYP038403; Thu, 29 Sep 2005 08:44:12 +0930 (CST) (envelope-from wilkinsa@squash.dsto.defence.gov.au) Received: (from wilkinsa@localhost) by squash.dsto.defence.gov.au (8.13.3/8.13.3/Submit) id j8SNE9Rw038402; Thu, 29 Sep 2005 08:44:09 +0930 (CST) (envelope-from wilkinsa) Date: Thu, 29 Sep 2005 08:44:09 +0930 From: "Wilkinson, Alex" To: Peter Jeremy Message-ID: <20050928231409.GB38338@squash.dsto.defence.gov.au> Mail-Followup-To: Peter Jeremy , Luigi Rizzo , Yar Tikhiy , freebsd-arch@freebsd.org, Max Laier References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> <20050928184731.GA72352@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20050928184731.GA72352@cirb503493.alcatel.com.au> User-Agent: Mutt/1.5.10i Cc: freebsd-arch@freebsd.org, Max Laier , Yar Tikhiy Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 23:17:09 -0000 0n Thu, Sep 29, 2005 at 04:47:32AM +1000, Peter Jeremy wrote: >On Wed, 2005-Sep-28 03:29:33 -0700, Luigi Rizzo wrote: >>On Wed, Sep 28, 2005 at 02:21:53PM +0400, Yar Tikhiy wrote: >>> On Sun, Sep 25, 2005 at 05:22:38AM +1000, Peter Jeremy wrote: >>> > >>> > Since I've recently needed it, neither bridge.c nor if_bridge.c allow >>> > you to bridge VLAN trunks (you can bridge individual VLANs but that >>> > becomes unwieldly when you have dozens of VLANs). I have code to do >>> > this in bridge.c. >>> >>> Couldn't you bridge across the parent, or trunk, physical interfaces >>> carrying tagged VLAN traffic then? (Of course, hardware support for >>> VLAN should be turned off on them in that case.) > >That's actually what I was trying to do. > >>yes in fact i was wondering what's wrong with that because >>we have been using bridge.c like this for ages now... > >The problem is that the current bridge code only considers the MAC >address for forwarding. When VLANs are in use, this is incorrect as >both the MAC address and VLAN tag must be considered. The difference >is crucial when you have the same MAC address appearing in multiple >VLANs. This can occur when using DECnet Phase IV or Solaris with >Cassini NICs - both of which have a per-host MAC address rather than a >per-NIC MAC address. > >As an example, consider a system with a host-based MAC address that >has two NICs. One NIC attaches to VLAN 123 on switch a, the other >attaches to VLAN 124 on switch b [this is the situation we have in our >test lab]. If I then attempt to join trunks from both switches using >bridge(4), it sees the same MAC address on both bridged interfaces and >shuts down. In reality, this situation is safe because the MAC >addresses are in different VLANs. Peter, What is the difference between a "per-host MAC address" and a "per-NIC MAC address" ? - aW From owner-freebsd-arch@FreeBSD.ORG Wed Sep 28 23:49:09 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12FA216A41F for ; Wed, 28 Sep 2005 23:49:09 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2A27A43D48 for ; Wed, 28 Sep 2005 23:49:07 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.3/8.13.3) with ESMTP id j8SNn59D036297; Thu, 29 Sep 2005 03:49:05 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.3/8.13.3/Submit) id j8SNn5Ic036296; Thu, 29 Sep 2005 03:49:05 +0400 (MSD) (envelope-from yar) Date: Thu, 29 Sep 2005 03:49:05 +0400 From: Yar Tikhiy To: Peter Jeremy Message-ID: <20050928234905.GA36083@comp.chem.msu.su> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> <20050928184731.GA72352@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050928184731.GA72352@cirb503493.alcatel.com.au> User-Agent: Mutt/1.5.9i Cc: freebsd-arch@freebsd.org, Max Laier Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Sep 2005 23:49:09 -0000 On Thu, Sep 29, 2005 at 04:47:32AM +1000, Peter Jeremy wrote: > > The problem is that the current bridge code only considers the MAC > address for forwarding. When VLANs are in use, this is incorrect as > both the MAC address and VLAN tag must be considered. The difference > is crucial when you have the same MAC address appearing in multiple > VLANs. This can occur when using DECnet Phase IV or Solaris with > Cassini NICs - both of which have a per-host MAC address rather than a > per-NIC MAC address. FWIW, this sounds quite reasonable to me. Indeed, there is plenty of good and not-so-good reasons for the same MAC address to appear on different VLANs, and it seems a licit case. -- Yar From owner-freebsd-arch@FreeBSD.ORG Thu Sep 29 09:08:18 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E5CBB16A41F for ; Thu, 29 Sep 2005 09:08:18 +0000 (GMT) (envelope-from tataz@tataz.chchile.org) Received: from smtp1-g19.free.fr (smtp1-g19.free.fr [212.27.42.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8BF6F43D49 for ; Thu, 29 Sep 2005 09:08:18 +0000 (GMT) (envelope-from tataz@tataz.chchile.org) Received: from tatooine.tataz.chchile.org (vol75-8-82-233-239-98.fbx.proxad.net [82.233.239.98]) by smtp1-g19.free.fr (Postfix) with ESMTP id 8C7322FA67; Thu, 29 Sep 2005 11:08:17 +0200 (CEST) Received: by tatooine.tataz.chchile.org (Postfix, from userid 1000) id B83D7405A; Thu, 29 Sep 2005 11:08:18 +0200 (CEST) Date: Thu, 29 Sep 2005 11:08:18 +0200 From: Jeremie Le Hen To: Yar Tikhiy Message-ID: <20050929090818.GD1086@obiwan.tataz.chchile.org> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050928102153.GA86457@comp.chem.msu.su> User-Agent: Mutt/1.5.10i Cc: freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Sep 2005 09:08:19 -0000 Hi Yar, > Couldn't you bridge across the parent, or trunk, physical interfaces > carrying tagged VLAN traffic then? (Of course, hardware support for > VLAN should be turned off on them in that case.) Since neither ipfw nor pf can filter on VLAN tag at layer 2, this could be pretty useful to be able to bridge vlan(4) interfaces together. For administrative reasons, you may not want to have all the VLANs living onto a physical network being seen to the other side of the bridge. I also know another situation where this can be useful. Once I've been asked to build a single firewall for a whole rack of servers. These servers where remotely administrated by customers and therefore we had no security control over them. Thus we wanted the firewall to protect the servers from the Internet but also from others round servers, that may have been defaced. For other reasons, we needed a bridge and no NAT was possible. The idea was to give each server its own VLAN, and the firewall bridged them together. I set up this firewall with Linux, I would be glad to be able to do so with FreeBSD. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 06:55:48 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F0D016A420 for ; Fri, 30 Sep 2005 06:55:48 +0000 (GMT) (envelope-from dougb@FreeBSD.org) Received: from mail2.fluidhosting.com (mail2.fluidhosting.com [204.14.90.62]) by mx1.FreeBSD.org (Postfix) with SMTP id 7280A43D53 for ; Fri, 30 Sep 2005 06:55:47 +0000 (GMT) (envelope-from dougb@FreeBSD.org) Received: (qmail 84445 invoked by uid 399); 30 Sep 2005 06:55:46 -0000 Received: from mail1.fluidhosting.com (204.14.90.61) by mail2.fluidhosting.com with SMTP; 30 Sep 2005 06:55:46 -0000 Received: (qmail 11639 invoked by uid 399); 30 Sep 2005 06:55:46 -0000 Received: from localhost (HELO ?192.168.1.102?) (dougb@dougbarton.net@127.0.0.1) by localhost with SMTP; 30 Sep 2005 06:55:46 -0000 Message-ID: <433CE171.8080301@FreeBSD.org> Date: Thu, 29 Sep 2005 23:55:45 -0700 From: Doug Barton Organization: http://www.FreeBSD.org/ User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Yar Tikhiy References: <20050928164803.GA11556@comp.chem.msu.su> In-Reply-To: <20050928164803.GA11556@comp.chem.msu.su> X-Enigmail-Version: 0.92.1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, freebsd-rc@FreeBSD.org Subject: Re: Minor issues in our rcNG X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-rc@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 06:55:48 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ok, so, this is going to seem like "pick on Yar" night, but it really isn't meant to be. :) We have a list for discussion of issues related to the rc.d system (it's no longer referred to as rcNG), so I'm cc'ing that list, and would ask that you follow up there instead of -arch. Yar Tikhiy wrote: | Hi there, | | The larger issue I'd like to discuss is as follows. Presently, | ${_program} variables are special in that they always override | ${command}. Some rc.d scripts (ab)use this to skip setting ${command}, | e.g., sshd. Some other scripts (ab)use ${_program} despite | they are not just starting ${command} once, e.g., pf, and so make | rc.subr think they have functionality that isn't realy there, such as | poll. Perhaps it's time to separate these two cases in a way? I think that here, and in the rest of the message you've stated the problem you're concerned about well, but I confess that I'm not 100% sure what the negative affects of this problem are. Sorry if I'm being dense here, but before I get excited about this I want to be sure I understand the problem. | I can see two possible approaches. One is to require scripts just | starting a daemon always set ${command}, and to apply the attached | patch to rc.subr. With the patch, ${command} is overridden only | if it was set in the first place. The other approach is to prevent | non-daemon scripts from abusing ${_program}. However, in the | latter case defining ${_program}, e.g., by mistake, still can | affect the script while it should not. | | The second issue is rather small. The rc command "reload" is | supported by code in /etc/rc.subr, but it doesn't appear on the | default list of commands unlike "status" or "poll" when there is | ${pidfile} or ${command} set, and so it is unusable by default. | Scripts have to use ``extra_commands=reload''. In addition, all | this is undocumented. Should "reload" be added to the list of | available commands along with "status" and "poll"? I think this sounds reasonable. Do you have patches for this? | Finally, we have a script named rcconf.sh which doesn't seem | to do anything useful now since all the other scripts invoke | load_rc_config by their own. Can we drop it then? I wouldn't lose sleep over this one, I'm planning to deal with it when I introduce the changes for rcorder in /usr/local/etc/rc.d. Doug - -- ~ This .signature sanitized for your protection -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDPOFxyIakK9Wy8PsRArroAKC6lkrsFVBZVXv+2Cwh+2/F06BEKQCfS70B +ydJ4hsRhsdVuM+KtXEbxfc= =WKZt -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 09:56:52 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9824016A41F for ; Fri, 30 Sep 2005 09:56:52 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from mail25.syd.optusnet.com.au (mail25.syd.optusnet.com.au [211.29.133.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id E802843D49 for ; Fri, 30 Sep 2005 09:56:51 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c220-239-19-236.belrs4.nsw.optusnet.com.au [220.239.19.236]) by mail25.syd.optusnet.com.au (8.12.11/8.12.11) with ESMTP id j8U9un7w015243 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 30 Sep 2005 19:56:50 +1000 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.10/8.12.10) with ESMTP id j8U9unSR075748 for ; Fri, 30 Sep 2005 19:56:49 +1000 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost) by cirb503493.alcatel.com.au (8.12.10/8.12.9/Submit) id j8U9unwE075747 for freebsd-arch@freebsd.org; Fri, 30 Sep 2005 19:56:49 +1000 (EST) (envelope-from pjeremy) Date: Fri, 30 Sep 2005 19:56:49 +1000 From: Peter Jeremy To: freebsd-arch@freebsd.org Message-ID: <20050930095649.GK72352@cirb503493.alcatel.com.au> References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> <20050928184731.GA72352@cirb503493.alcatel.com.au> <20050928231409.GB38338@squash.dsto.defence.gov.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050928231409.GB38338@squash.dsto.defence.gov.au> User-Agent: Mutt/1.4.2.1i Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 09:56:52 -0000 On Thu, 2005-Sep-29 08:44:09 +0930, Wilkinson, Alex wrote: >What is the difference between a "per-host MAC address" and a "per-NIC >MAC address" ? All NICs have a unique MAC address. This address can be over-ridden by the host if it needs to have the same MAC address appear on multiple interfaces. Of the two cases I mentioned: DECnet changes all MAC addresses to one beginning AA0055 where the low bits are the host's DECnet address. This removes the need for IP's ARP since the source host can determine the destination host's MAC address without needing to ask the network. Some versions of Solaris with some NICs (definitely Solaris 8 with Cassini NICs) associate a MAC address with the host, rather than the NIC. I'm less certain of the rationale for this. -- Peter Jeremy From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 10:03:00 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B98F216A420 for ; Fri, 30 Sep 2005 10:03:00 +0000 (GMT) (envelope-from ceri@submonkey.net) Received: from shrike.submonkey.net (cpc2-cdif2-3-1-cust208.cdif.cable.ntl.com [82.31.78.208]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2BE1143D49 for ; Fri, 30 Sep 2005 10:02:59 +0000 (GMT) (envelope-from ceri@submonkey.net) Received: from ceri by shrike.submonkey.net with local (Exim 4.52 (FreeBSD)) id 1ELHjF-0004mI-C2; Fri, 30 Sep 2005 11:02:57 +0100 Date: Fri, 30 Sep 2005 11:02:57 +0100 From: Ceri Davies To: Peter Jeremy Message-ID: <20050930100257.GA4190@submonkey.net> Mail-Followup-To: Ceri Davies , Peter Jeremy , freebsd-arch@freebsd.org References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> <20050928184731.GA72352@cirb503493.alcatel.com.au> <20050928231409.GB38338@squash.dsto.defence.gov.au> <20050930095649.GK72352@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UugvWAfsgieZRqgk" Content-Disposition: inline In-Reply-To: <20050930095649.GK72352@cirb503493.alcatel.com.au> X-PGP: finger ceri@FreeBSD.org User-Agent: Mutt/1.5.11 Sender: Ceri Davies Cc: freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 10:03:00 -0000 --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 30, 2005 at 07:56:49PM +1000, Peter Jeremy wrote: > On Thu, 2005-Sep-29 08:44:09 +0930, Wilkinson, Alex wrote: > >What is the difference between a "per-host MAC address" and a "per-NIC > >MAC address" ? >=20 > All NICs have a unique MAC address. This address can be over-ridden by > the host if it needs to have the same MAC address appear on multiple > interfaces. >=20 > Of the two cases I mentioned: DECnet changes all MAC addresses to one > beginning AA0055 where the low bits are the host's DECnet address. > This removes the need for IP's ARP since the source host can determine > the destination host's MAC address without needing to ask the network. >=20 > Some versions of Solaris with some NICs (definitely Solaris 8 with > Cassini NICs) associate a MAC address with the host, rather than the > NIC. I'm less certain of the rationale for this. That's a setting in the SPARC eeprom: local-mac-address? set to true uses the MAC addresses on the NICs. The rationale is not clear to me either. Ceri --=20 Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. -- Einstein (attrib.) --UugvWAfsgieZRqgk Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDPQ1RocfcwTS3JF8RAsRgAJoDiv/pQ8QWiMRpfyHLZW9KbsN9xwCgsUrc v0AGwguvPMFenjL6dMopp8c= =I684 -----END PGP SIGNATURE----- --UugvWAfsgieZRqgk-- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 10:57:48 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 46E6716A41F for ; Fri, 30 Sep 2005 10:57:48 +0000 (GMT) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.FreeBSD.org (Postfix) with ESMTP id C202D43D53 for ; Fri, 30 Sep 2005 10:57:47 +0000 (GMT) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id D7EEB6155; Fri, 30 Sep 2005 12:57:32 +0200 (CEST) Received: from xps.des.no (des.no [80.203.228.37]) by tim.des.no (Postfix) with ESMTP id C80216152; Fri, 30 Sep 2005 12:57:32 +0200 (CEST) Received: by xps.des.no (Postfix, from userid 1001) id A227333C1D; Fri, 30 Sep 2005 12:57:41 +0200 (CEST) To: Peter Jeremy References: <200509241525.16173.max@love2party.net> <20050924192237.GP40237@cirb503493.alcatel.com.au> <20050928102153.GA86457@comp.chem.msu.su> <20050928032933.G16027@xorpc.icir.org> <20050928184731.GA72352@cirb503493.alcatel.com.au> <20050928231409.GB38338@squash.dsto.defence.gov.au> <20050930095649.GK72352@cirb503493.alcatel.com.au> From: des@des.no (=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=) Date: Fri, 30 Sep 2005 12:57:41 +0200 In-Reply-To: <20050930095649.GK72352@cirb503493.alcatel.com.au> (Peter Jeremy's message of "Fri, 30 Sep 2005 19:56:49 +1000") Message-ID: <86y85eesbe.fsf@xps.des.no> User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Tests: ALL_TRUSTED,AWL,BAYES_00 X-Spam-Learn: ham X-Spam-Score: -5.2/3.0 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on tim.des.no Cc: freebsd-arch@freebsd.org Subject: Re: Bridges X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 10:57:48 -0000 Peter Jeremy writes: > Some versions of Solaris with some NICs (definitely Solaris 8 with > Cassini NICs) associate a MAC address with the host, rather than the > NIC. I'm less certain of the rationale for this. Traditionally, Sun boxen had 48-bit serial numbers which they used as MAC addresses on all interfaces. I believe this MAC-per-host setup is what the Ethernet inventors originally intended. Sun now uses off-the-shelf components to a much larger degree than they used to, so most of their stuff is now MAC-per-interface. DES --=20 Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 12:40:04 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8EB5916A420; Fri, 30 Sep 2005 12:40:04 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id A02AE43D53; Fri, 30 Sep 2005 12:40:03 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.3/8.13.3) with ESMTP id j8UCe0nT045638 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Sep 2005 16:40:01 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.3/8.13.1/Submit) id j8UCe03o045634; Fri, 30 Sep 2005 16:40:00 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 30 Sep 2005 16:40:00 +0400 From: Gleb Smirnoff To: arch@FreeBSD.org, net@FreeBSD.org Message-ID: <20050930124000.GA45345@cell.sick.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="k+w/mQv8wyuph6w0" Content-Disposition: inline User-Agent: Mutt/1.5.6i Cc: Subject: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 12:40:04 -0000 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline [please, follow-up on net@ only] Colleagues, here are some patches for review. Problems addressed: 1) When Giant was removed from polling a problem was introduced. The idle poll feature was broken. The idle poll thread can enter polling handler on one interface and put to sleep for a long time, until CPU resources found. During this time no traffic is received on interface. Well, this is what idle thread is supposed to do. Why didn't this happen with Giant? Because idle poll entered poll handler holding Giant, and other threads (in particular netisr poll) contested on Giant and propagated their priority to idle poll. Well, this is a hack, but idle poll significantly improves polling performance on an idle box, that's why I won't axe it but try to fix it. To address the problem we need to use the same technique as before, but use poll_mtx instead of Giant. However, this will resurrect LORs, that were fixed with Giant removal. The alternative lock path happens, when driver decides to deregister from polling itself. The LOR is fixed by further changes. See 3). 2) Drivers indicate their ability to do polling(4) with IFCAP_POLLING flag int if_capabilites field. Setting the flag in if_capenable should register interface with polling and disable interrupts. However, the if_flags is also abused with IFF_POLLING flag. The aim is to remove IFF_POLLING flag. 3) The polling is switched on and off not functionally. That is, when you say 'sysctl kern.polling.enable=1' or 'ifconfig fxp0 -polling', the polling is switched on/off not immediately but on next tick or next interrupt. This non-functional approach leads to a lot of ambiguouties in code, makes it harder to understand and maintain. It also exposes race conditions. The attached patch removes: - IFF_POLLING flag. - Use of if_flags, if_drv_flags, if_capenable from kern_poll.c. All current accesses to these fields are not locked and polling shouldn't look there. - poll in trap feature. Sorry, we can't acquire mutexes in trap(). Anyone used it, anyway? - POLL_DEREGISTER command. No hacks. Everything is done functionally via ioctl(). The new world order for driver is the following: 1) Declare IFCAP_POLLING in if_capabilities on attach. Do not touch if_capenable. 2) in ioctl method, in SIOCSIFCAP case the driver should: - call ether_poll_[de]register - if no error, set the IFCAP_POLLING flag in if_capenable - obtain driver lock - [dis/en]able interrupts - drop driver lock 3) In poll method, check IFF_DRV_RUNNING flag after obtaining driver lock 4) In interrupt handler check IFCAP_POLLING flag in if_capenable. If present, then return. This is important to protect from spurious interrupts. 5) In device detach method, call ether_poll_deregister() before obtaining driver lock. The new world order for user is the following: - polling should be enabled and disabled with ifconfig(8) - kern.polling.enable is now deprecated. It is kept for some compatibility. When you set it to 1, polling is turned on for all capable interfaces. In case of 0 polling is turned off. - no poll in trap The attached patch touches only em(4) and fxp(4). I will write patches for all other drivers ASAP. But I don't have all the hardware, so if you are using polling(4) and you run FreeBSD 6 or 7, please help me with testing. ATM only em(4) driver patch is tested. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=koi8-r Content-Disposition: attachment; filename="newpoll.diff" Index: amd64/amd64/trap.c =================================================================== RCS file: /home/ncvs/src/sys/amd64/amd64/trap.c,v retrieving revision 1.293 diff -u -r1.293 trap.c --- amd64/amd64/trap.c 28 Sep 2005 07:03:02 -0000 1.293 +++ amd64/amd64/trap.c 29 Sep 2005 09:41:32 -0000 @@ -146,11 +146,6 @@ extern char *syscallnames[]; #endif -#ifdef DEVICE_POLLING -extern u_int32_t poll_in_trap; -extern int ether_poll(int count); -#endif /* DEVICE_POLLING */ - /* * Exception, fault, and trap interface to the FreeBSD kernel. * This common code is called from assembly language IDT gate entry @@ -241,11 +236,6 @@ trap_fatal(&frame, frame.tf_addr); } -#ifdef DEVICE_POLLING - if (poll_in_trap) - ether_poll(poll_in_trap); -#endif /* DEVICE_POLLING */ - if (ISPL(frame.tf_cs) == SEL_UPL) { /* user trap */ Index: dev/em/if_em.c =================================================================== RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.72 diff -u -r1.72 if_em.c --- dev/em/if_em.c 20 Sep 2005 14:52:57 -0000 1.72 +++ dev/em/if_em.c 29 Sep 2005 13:26:42 -0000 @@ -197,6 +197,9 @@ static void em_add_int_delay_sysctl(struct adapter *, const char *, const char *, struct em_int_delay_info *, int, int); +#ifdef DEVICE_POLLING +static poll_handler_t em_poll; +#endif /********************************************************************* * FreeBSD Device Interface Entry Points @@ -526,6 +529,11 @@ INIT_DEBUGOUT("em_detach: begin"); +#ifdef DEVICE_POLLING + if (ifc->if_capenable & IFCAP_POLLING) + ether_poll_deregister(ifp); +#endif + EM_LOCK(adapter); adapter->in_detach = 1; em_stop(adapter); @@ -717,7 +725,7 @@ em_initialize_receive_unit(adapter); } #ifdef DEVICE_POLLING - if (!(ifp->if_flags & IFF_POLLING)) + if (!(ifp->if_capenable & IFCAP_POLLING)) #endif em_enable_intr(adapter); EM_UNLOCK(adapter); @@ -732,8 +740,26 @@ IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFCAP (Set Capabilities)"); reinit = 0; mask = ifr->ifr_reqcap ^ ifp->if_capenable; - if (mask & IFCAP_POLLING) - ifp->if_capenable ^= IFCAP_POLLING; +#ifdef DEVICE_POLLING + if (mask & IFCAP_POLLING) { + if (ifr->ifr_reqcap & IFCAP_POLLING) { + error = ether_poll_register(em_poll, ifp); + if (error) + return(error); + EM_LOCK(adapter); + em_disable_intr(adapter); + ifp->if_capenable |= IFCAP_POLLING; + EM_UNLOCK(adapter); + } else { + error = ether_poll_deregister(ifp); + /* Enable interrupt even in error case */ + EM_LOCK(adapter); + em_enable_intr(adapter); + ifp->if_capenable &= ~IFCAP_POLLING; + EM_UNLOCK(adapter); + } + } +#endif if (mask & IFCAP_HWCSUM) { ifp->if_capenable ^= IFCAP_HWCSUM; reinit = 1; @@ -895,7 +921,7 @@ * Only enable interrupts if we are not polling, make sure * they are off otherwise. */ - if (ifp->if_flags & IFF_POLLING) + if (ifp->if_capenable & IFCAP_POLLING) em_disable_intr(adapter); else #endif /* DEVICE_POLLING */ @@ -920,8 +946,6 @@ #ifdef DEVICE_POLLING -static poll_handler_t em_poll; - static void em_poll_locked(struct ifnet *ifp, enum poll_cmd cmd, int count) { @@ -930,14 +954,6 @@ mtx_assert(&adapter->mtx, MA_OWNED); - if (!(ifp->if_capenable & IFCAP_POLLING)) { - ether_poll_deregister(ifp); - cmd = POLL_DEREGISTER; - } - if (cmd == POLL_DEREGISTER) { /* final call, enable interrupts */ - em_enable_intr(adapter); - return; - } if (cmd == POLL_AND_CHECK_STATUS) { reg_icr = E1000_READ_REG(&adapter->hw, ICR); if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC)) { @@ -948,13 +964,10 @@ callout_reset(&adapter->timer, hz, em_local_timer, adapter); } } - if (ifp->if_drv_flags & IFF_DRV_RUNNING) { - em_process_receive_interrupts(adapter, count); - em_clean_transmit_interrupts(adapter); - } + em_process_receive_interrupts(adapter, count); + em_clean_transmit_interrupts(adapter); - if (ifp->if_drv_flags & IFF_DRV_RUNNING && - !IFQ_DRV_IS_EMPTY(&ifp->if_snd)) + if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) em_start_locked(ifp); } @@ -964,7 +977,8 @@ struct adapter *adapter = ifp->if_softc; EM_LOCK(adapter); - em_poll_locked(ifp, cmd, count); + if (ifp->if_drv_flags & IFF_DRV_RUNNING) + em_poll_locked(ifp, cmd, count); EM_UNLOCK(adapter); } #endif /* DEVICE_POLLING */ @@ -987,18 +1001,10 @@ ifp = adapter->ifp; #ifdef DEVICE_POLLING - if (ifp->if_flags & IFF_POLLING) { + if (ifp->if_capenable & IFCAP_POLLING) { EM_UNLOCK(adapter); return; } - - if ((ifp->if_capenable & IFCAP_POLLING) && - ether_poll_register(em_poll, ifp)) { - em_disable_intr(adapter); - em_poll_locked(ifp, 0, 1); - EM_UNLOCK(adapter); - return; - } #endif /* DEVICE_POLLING */ reg_icr = E1000_READ_REG(&adapter->hw, ICR); @@ -1718,9 +1724,7 @@ mtx_assert(&adapter->mtx, MA_OWNED); INIT_DEBUGOUT("em_stop: begin"); -#ifdef DEVICE_POLLING - ether_poll_deregister(ifp); -#endif + em_disable_intr(adapter); em_reset_hw(&adapter->hw); callout_stop(&adapter->timer); @@ -1976,7 +1980,6 @@ #ifdef DEVICE_POLLING ifp->if_capabilities |= IFCAP_POLLING; - ifp->if_capenable |= IFCAP_POLLING; #endif /* @@ -2911,12 +2914,13 @@ adapter->fmp = NULL); if (adapter->fmp != NULL) { + struct mbuf *m = adapter->fmp; + adapter->fmp = NULL; EM_UNLOCK(adapter); - (*ifp->if_input)(ifp, adapter->fmp); + (*ifp->if_input)(ifp, m); EM_LOCK(adapter); } #endif - adapter->fmp = NULL; adapter->lmp = NULL; } } else { Index: dev/fxp/if_fxp.c =================================================================== RCS file: /home/ncvs/src/sys/dev/fxp/if_fxp.c,v retrieving revision 1.247 diff -u -r1.247 if_fxp.c --- dev/fxp/if_fxp.c 27 Sep 2005 09:01:10 -0000 1.247 +++ dev/fxp/if_fxp.c 29 Sep 2005 13:28:26 -0000 @@ -773,7 +773,6 @@ #ifdef DEVICE_POLLING /* Inform the world we support polling. */ ifp->if_capabilities |= IFCAP_POLLING; - ifp->if_capenable |= IFCAP_POLLING; #endif /* @@ -891,6 +890,11 @@ { struct fxp_softc *sc = device_get_softc(dev); +#ifdef DEVICE_POLLING + if (ifc->if_capenable & IFCAP_POLLING) + ether_poll_deregister(sc->ifp); +#endif + FXP_LOCK(sc); sc->suspended = 1; /* Do same thing as we do for suspend */ /* @@ -1448,15 +1452,11 @@ uint8_t statack; FXP_LOCK(sc); - if (!(ifp->if_capenable & IFCAP_POLLING)) { - ether_poll_deregister(ifp); - cmd = POLL_DEREGISTER; - } - if (cmd == POLL_DEREGISTER) { /* final call, enable interrupts */ - CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, 0); + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { FXP_UNLOCK(sc); return; } + statack = FXP_SCB_STATACK_CXTNO | FXP_SCB_STATACK_CNA | FXP_SCB_STATACK_FR; if (cmd == POLL_AND_CHECK_STATUS) { @@ -1495,18 +1495,10 @@ } #ifdef DEVICE_POLLING - if (ifp->if_flags & IFF_POLLING) { + if (ifp->if_capenable & IFCAP_POLLING) { FXP_UNLOCK(sc); return; } - if ((ifp->if_capenable & IFCAP_POLLING) && - ether_poll_register(fxp_poll, ifp)) { - /* disable interrupts */ - CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, FXP_SCB_INTR_DISABLE); - FXP_UNLOCK(sc); - fxp_poll(ifp, 0, 1); - return; - } #endif while ((statack = CSR_READ_1(sc, FXP_CSR_SCB_STATACK)) != 0) { /* @@ -1837,9 +1829,6 @@ ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); ifp->if_timer = 0; -#ifdef DEVICE_POLLING - ether_poll_deregister(ifp); -#endif /* * Cancel stats updater. */ @@ -2163,7 +2152,7 @@ * ... but only do that if we are not polling. And because (presumably) * the default is interrupts on, we need to disable them explicitly! */ - if ( ifp->if_flags & IFF_POLLING ) + if (ifp->if_capenable & IFCAP_POLLING ) CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, FXP_SCB_INTR_DISABLE); else #endif /* DEVICE_POLLING */ @@ -2418,11 +2407,30 @@ break; case SIOCSIFCAP: - FXP_LOCK(sc); mask = ifp->if_capenable ^ ifr->ifr_reqcap; - if (mask & IFCAP_POLLING) - ifp->if_capenable ^= IFCAP_POLLING; +#ifdef DEVICE_POLLING + if (mask & IFCAP_POLLING) { + if (ifr->ifr_reqcap & IFCAP_POLLING) { + error = ether_poll_register(fxp_poll, ifp); + if (error) + return(error); + FXP_LOCK(sc); + CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, + FXP_SCB_INTR_DISABLE); + ifp->if_capenable |= IFCAP_POLLING; + FXP_UNLOCK(sc); + } else { + error = ether_poll_deregister(ifp); + /* Enable interrupts in any case */ + FXP_LOCK(sc); + CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, 0); + ifp->if_capenable &= ~IFCAP_POLLING; + FXP_UNLOCK(sc); + } + } +#endif if (mask & IFCAP_VLAN_MTU) { + FXP_LOCK(sc); ifp->if_capenable ^= IFCAP_VLAN_MTU; if (sc->revision != FXP_REV_82557) flag = FXP_FLAG_LONG_PKT_EN; @@ -2431,8 +2439,8 @@ sc->flags ^= flag; if (ifp->if_flags & IFF_UP) fxp_init_body(sc); + FXP_UNLOCK(sc); } - FXP_UNLOCK(sc); break; default: Index: i386/i386/trap.c =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.280 diff -u -r1.280 trap.c --- i386/i386/trap.c 28 Sep 2005 07:03:03 -0000 1.280 +++ i386/i386/trap.c 29 Sep 2005 09:42:21 -0000 @@ -160,11 +160,6 @@ extern char *syscallnames[]; #endif -#ifdef DEVICE_POLLING -extern u_int32_t poll_in_trap; -extern int ether_poll(int count); -#endif /* DEVICE_POLLING */ - /* * Exception, fault, and trap interface to the FreeBSD kernel. * This common code is called from assembly language IDT gate entry @@ -272,11 +267,6 @@ trap_fatal(&frame, eva); } -#ifdef DEVICE_POLLING - if (poll_in_trap) - ether_poll(poll_in_trap); -#endif /* DEVICE_POLLING */ - if ((ISPL(frame.tf_cs) == SEL_UPL) || ((frame.tf_eflags & PSL_VM) && !(PCPU_GET(curpcb)->pcb_flags & PCB_VM86CALL))) { Index: kern/kern_poll.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_poll.c,v retrieving revision 1.22 diff -u -r1.22 kern_poll.c --- kern/kern_poll.c 6 Sep 2005 11:09:18 -0000 1.22 +++ kern/kern_poll.c 29 Sep 2005 12:26:58 -0000 @@ -32,6 +32,7 @@ #include #include #include /* needed by net/if.h */ +#include #include #include @@ -44,14 +45,15 @@ static void netisr_poll(void); /* the two netisr handlers */ static void netisr_pollmore(void); +static int poll_switch(SYSCTL_HANDLER_ARGS); void hardclock_device_poll(void); /* hook from hardclock */ -void ether_poll(int); /* polling while in trap */ +void ether_poll(int); /* polling in idle loop */ /* * Polling support for [network] device drivers. * - * Drivers which support this feature try to register with the + * Drivers which support this feature can register with the * polling code. * * If registration is successful, the driver must disable interrupts, @@ -64,10 +66,6 @@ * POLL_AND_CHECK_STATUS: as above, plus check status registers or do * other more expensive operations. This command is issued periodically * but less frequently than POLL_ONLY. - * POLL_DEREGISTER: deregister and return to interrupt mode. - * - * The first two commands are only issued if the interface is marked as - * 'IFF_UP and IFF_DRV_RUNNING', the last one only if IFF_DRV_RUNNING is set. * * The count limit specifies how much work the handler can do during the * call -- typically this is the number of packets to be received, or @@ -75,11 +73,9 @@ * as the max time spent in the function grows roughly linearly with the * count). * - * Deregistration can be requested by the driver itself (typically in the - * *_stop() routine), or by the polling code, by invoking the handler. - * - * Polling can be globally enabled or disabled with the sysctl variable - * kern.polling.enable (default is 0, disabled) + * Polling is enabled and disabled via setting IFCAP_POLLING flag on + * the interface. The driver ioctl handler should register interface + * with polling and disable interrupts, if registration was successfull. * * A second variable controls the sharing of CPU between polling/kernel * network processing, and other activities (typically userlevel tasks): @@ -91,7 +87,7 @@ * The following constraints hold * * 1 <= poll_each_burst <= poll_burst <= poll_burst_max - * 0 <= poll_in_trap <= poll_each_burst + * 0 <= poll_each_burst * MIN_POLL_BURST_MAX <= poll_burst_max <= MAX_POLL_BURST_MAX */ @@ -117,10 +113,6 @@ SYSCTL_UINT(_kern_polling, OID_AUTO, idle_poll, CTLFLAG_RW, &poll_in_idle_loop, 0, "Enable device polling in idle loop"); -u_int32_t poll_in_trap; /* used in trap.c */ -SYSCTL_UINT(_kern_polling, OID_AUTO, poll_in_trap, CTLFLAG_RW, - &poll_in_trap, 0, "Poll burst size during a trap"); - static u_int32_t user_frac = 50; SYSCTL_UINT(_kern_polling, OID_AUTO, user_frac, CTLFLAG_RW, &user_frac, 0, "Desired user fraction of cpu time"); @@ -149,9 +141,9 @@ SYSCTL_UINT(_kern_polling, OID_AUTO, handlers, CTLFLAG_RD, &poll_handlers, 0, "Number of registered poll handlers"); -static int polling = 0; /* global polling enable */ -SYSCTL_UINT(_kern_polling, OID_AUTO, enable, CTLFLAG_RW, - &polling, 0, "Polling enabled"); +static int polling = 0; +SYSCTL_PROC(_kern_polling, OID_AUTO, enable, CTLTYPE_UINT | CTLFLAG_RW, + 0, sizeof(int), poll_switch, "I", "Switch polling for all interfaces"); static u_int32_t phase; SYSCTL_UINT(_kern_polling, OID_AUTO, phase, CTLFLAG_RW, @@ -174,23 +166,9 @@ struct pollrec { poll_handler_t *handler; struct ifnet *ifp; - /* - * Flags of polling record (protected by poll_mtx). - * PRF_RUNNING means that the handler is now executing. - * PRF_LEAVING means that the handler is now deregistering. - */ -#define PRF_RUNNING 0x1 -#define PRF_LEAVING 0x2 - uint32_t flags; }; static struct pollrec pr[POLL_LIST_LEN]; - -#define PR_VALID(i) (pr[(i)].handler != NULL && \ - !(pr[(i)].flags & (PRF_RUNNING|PRF_LEAVING)) && \ - (pr[(i)].ifp->if_drv_flags & IFF_DRV_RUNNING) &&\ - (pr[(i)].ifp->if_flags & IFF_UP)) - static struct mtx poll_mtx; static void @@ -258,30 +236,24 @@ } /* - * ether_poll is called from the idle loop or from the trap handler. + * ether_poll is called from the idle loop. */ void ether_poll(int count) { int i; + NET_LOCK_GIANT(); mtx_lock(&poll_mtx); if (count > poll_each_burst) count = poll_each_burst; - for (i = 0 ; i < poll_handlers ; i++) { - if (PR_VALID(i)) { - pr[i].flags |= PRF_RUNNING; - mtx_unlock(&poll_mtx); - NET_LOCK_GIANT(); - pr[i].handler(pr[i].ifp, POLL_ONLY, count); - NET_UNLOCK_GIANT(); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_RUNNING; - } - } + for (i = 0 ; i < poll_handlers ; i++) + pr[i].handler(pr[i].ifp, POLL_ONLY, count); + mtx_unlock(&poll_mtx); + NET_UNLOCK_GIANT(); } /* @@ -403,60 +375,29 @@ residual_burst : poll_each_burst; residual_burst -= cycles; - if (polling) { - for (i = 0 ; i < poll_handlers ; i++) { - if (PR_VALID(i)) { - pr[i].flags |= PRF_RUNNING; - mtx_unlock(&poll_mtx); - pr[i].handler(pr[i].ifp, arg, cycles); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_RUNNING; - } - } - } else { /* unregister */ - for (i = 0 ; i < poll_handlers ; i++) { - if (pr[i].handler != NULL && - pr[i].ifp->if_drv_flags & IFF_DRV_RUNNING) { - pr[i].ifp->if_flags &= ~IFF_POLLING; - pr[i].flags |= PRF_LEAVING; - mtx_unlock(&poll_mtx); - pr[i].handler(pr[i].ifp, POLL_DEREGISTER, 1); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_LEAVING; - } - pr[i].handler = NULL; - } - residual_burst = 0; - poll_handlers = 0; - } + for (i = 0 ; i < poll_handlers ; i++) + pr[i].handler(pr[i].ifp, arg, cycles); phase = 4; mtx_unlock(&poll_mtx); } /* - * Try to register routine for polling. Returns 1 if successful - * (and polling should be enabled), 0 otherwise. + * Try to register routine for polling. Returns 0 if successful + * (and polling should be enabled), error code otherwise. * A device is not supposed to register itself multiple times. * - * This is called from within the *_intr() functions, so we do not need - * further ifnet locking. + * This is called from within the *_ioctl() functions. */ int ether_poll_register(poll_handler_t *h, struct ifnet *ifp) { int i; - NET_ASSERT_GIANT(); + KASSERT(h != NULL, ("%s: handler is NULL", __func__)); + KASSERT(ifp != NULL, ("%s: ifp is NULL", __func__)); - if (polling == 0) /* polling disabled, cannot register */ - return 0; - if (h == NULL || ifp == NULL) /* bad arguments */ - return 0; - if ( !(ifp->if_flags & IFF_UP) ) /* must be up */ - return 0; - if (ifp->if_flags & IFF_POLLING) /* already polling */ - return 0; + NET_ASSERT_GIANT(); mtx_lock(&poll_mtx); if (poll_handlers >= POLL_LIST_LEN) { @@ -474,7 +415,7 @@ verbose--; } mtx_unlock(&poll_mtx); - return 0; /* no polling for you */ + return (ENOMEM); /* no polling for you */ } for (i = 0 ; i < poll_handlers ; i++) @@ -482,45 +423,39 @@ mtx_unlock(&poll_mtx); log(LOG_DEBUG, "ether_poll_register: %s: handler" " already registered\n", ifp->if_xname); - return (0); + return (EEXIST); } pr[poll_handlers].handler = h; pr[poll_handlers].ifp = ifp; poll_handlers++; - ifp->if_flags |= IFF_POLLING; mtx_unlock(&poll_mtx); if (idlepoll_sleeping) wakeup(&idlepoll_sleeping); - return 1; /* polling enabled in next call */ + return (0); } /* - * Remove interface from the polling list. Normally called by *_stop(). - * It is not an error to call it with IFF_POLLING clear, the call is - * sufficiently rare to be preferable to save the space for the extra - * test in each driver in exchange of one additional function call. + * Remove interface from the polling list. Called from *_ioctl(), too. */ int ether_poll_deregister(struct ifnet *ifp) { int i; - NET_ASSERT_GIANT(); + KASSERT(ifp != NULL, ("%s: ifp is NULL", __func__)); - if ( !ifp || !(ifp->if_flags & IFF_POLLING) ) { - return 0; - } + NET_ASSERT_GIANT(); mtx_lock(&poll_mtx); + for (i = 0 ; i < poll_handlers ; i++) if (pr[i].ifp == ifp) /* found it */ break; - ifp->if_flags &= ~IFF_POLLING; /* found or not... */ if (i == poll_handlers) { - mtx_unlock(&poll_mtx); log(LOG_DEBUG, "ether_poll_deregister: %s: not found!\n", ifp->if_xname); - return (0); + mtx_unlock(&poll_mtx); + return (ENOENT); } poll_handlers--; if (i < poll_handlers) { /* Last entry replaces this one. */ @@ -528,7 +463,60 @@ pr[i].ifp = pr[poll_handlers].ifp; } mtx_unlock(&poll_mtx); - return (1); + return (0); +} + +/* + * Legacy interface for turning polling on all interfaces at one time. + */ +static int +poll_switch(SYSCTL_HANDLER_ARGS) +{ + struct ifnet *ifp; + int error; + int val; + + mtx_lock(&poll_mtx); + val = polling; + mtx_unlock(&poll_mtx); + + error = sysctl_handle_int(oidp, &val, sizeof(int), req); + if (error || !req->newptr ) + return (error); + + if (val == polling) + return (0); + + if (val < 0 || val > 1) + return (EINVAL); + + mtx_lock(&poll_mtx); + polling = val; + mtx_unlock(&poll_mtx); + + NET_LOCK_GIANT(); + IFNET_RLOCK(); + TAILQ_FOREACH(ifp, &ifnet, if_link) { + if (ifp->if_capabilities & IFCAP_POLLING) { + struct ifreq ifr; + + if (val == 1) + ifr.ifr_reqcap = + ifp->if_capenable | IFCAP_POLLING; + else + ifr.ifr_reqcap = + ifp->if_capenable & ~IFCAP_POLLING; + IFF_LOCKGIANT(ifp); /* LOR here */ + (void) (*ifp->if_ioctl)(ifp, SIOCSIFCAP, (caddr_t)&ifr); + IFF_UNLOCKGIANT(ifp); + } + } + IFNET_RUNLOCK(); + NET_UNLOCK_GIANT(); + + log(LOG_ERR, "kern.polling.enable is deprecated. Use ifconfig(8)"); + + return (0); } static void Index: net/if.h =================================================================== RCS file: /home/ncvs/src/sys/net/if.h,v retrieving revision 1.98 diff -u -r1.98 if.h --- net/if.h 9 Aug 2005 12:56:20 -0000 1.98 +++ net/if.h 29 Sep 2005 11:25:05 -0000 @@ -148,7 +148,7 @@ #define IFF_LINK2 0x4000 /* per link layer defined bit */ #define IFF_ALTPHYS IFF_LINK2 /* use alternate physical connection */ #define IFF_MULTICAST 0x8000 /* (i) supports multicast */ -#define IFF_POLLING 0x10000 /* (n) Interface is in polling mode. */ +/* 0x10000 */ #define IFF_PPROMISC 0x20000 /* (n) user-requested promisc mode */ #define IFF_MONITOR 0x40000 /* (n) user-requested monitor mode */ #define IFF_STATICARP 0x80000 /* (n) static ARP */ @@ -166,8 +166,7 @@ /* flags set internally only: */ #define IFF_CANTCHANGE \ (IFF_BROADCAST|IFF_POINTOPOINT|IFF_DRV_RUNNING|IFF_DRV_OACTIVE|\ - IFF_SIMPLEX|IFF_MULTICAST|IFF_ALLMULTI|IFF_SMART|IFF_PROMISC|\ - IFF_POLLING) + IFF_SIMPLEX|IFF_MULTICAST|IFF_ALLMULTI|IFF_SMART|IFF_PROMISC) /* * Values for if_link_state. Index: net/if_var.h =================================================================== RCS file: /home/ncvs/src/sys/net/if_var.h,v retrieving revision 1.102 diff -u -r1.102 if_var.h --- net/if_var.h 9 Aug 2005 10:16:17 -0000 1.102 +++ net/if_var.h 29 Sep 2005 10:19:43 -0000 @@ -660,7 +660,7 @@ LLADDR((struct sockaddr_dl *) ifaddr_byindex((ifp)->if_index)->ifa_addr) #ifdef DEVICE_POLLING -enum poll_cmd { POLL_ONLY, POLL_AND_CHECK_STATUS, POLL_DEREGISTER }; +enum poll_cmd { POLL_ONLY, POLL_AND_CHECK_STATUS }; typedef void poll_handler_t(struct ifnet *ifp, enum poll_cmd cmd, int count); int ether_poll_register(poll_handler_t *h, struct ifnet *ifp); --k+w/mQv8wyuph6w0-- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 12:43:53 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6F2DB16A41F; Fri, 30 Sep 2005 12:43:53 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id E490343D49; Fri, 30 Sep 2005 12:43:52 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 6EF3BBC9A; Fri, 30 Sep 2005 12:43:51 +0000 (UTC) To: Gleb Smirnoff From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 30 Sep 2005 16:40:00 +0400." <20050930124000.GA45345@cell.sick.ru> Date: Fri, 30 Sep 2005 14:43:51 +0200 Message-ID: <32170.1128084231@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 12:43:53 -0000 I still think we should stop having this network-centric view of polling and implement _real_ *device* polling, so that other device types can use it as well. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 12:47:26 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DDF0616A41F; Fri, 30 Sep 2005 12:47:26 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F88343D4C; Fri, 30 Sep 2005 12:47:26 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.3/8.13.3) with ESMTP id j8UClOFR045802 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Sep 2005 16:47:25 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.3/8.13.1/Submit) id j8UClObU045801; Fri, 30 Sep 2005 16:47:24 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 30 Sep 2005 16:47:24 +0400 From: Gleb Smirnoff To: Poul-Henning Kamp Message-ID: <20050930124724.GB45345@cell.sick.ru> References: <20050930124000.GA45345@cell.sick.ru> <32170.1128084231@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <32170.1128084231@critter.freebsd.dk> User-Agent: Mutt/1.5.6i Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 12:47:27 -0000 On Fri, Sep 30, 2005 at 02:43:51PM +0200, Poul-Henning Kamp wrote: P> I still think we should stop having this network-centric view of P> polling and implement _real_ *device* polling, so that other P> device types can use it as well. I agree with both hands. My current work is aimed at RELENG_6 only. btw, I've made a step in this direction already - kern_poll can now forget about struct ifnet. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 12:47:37 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6D0E116A42F; Fri, 30 Sep 2005 12:47:37 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2935843D48; Fri, 30 Sep 2005 12:47:37 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with ESMTP id 4555F46B11; Fri, 30 Sep 2005 08:47:36 -0400 (EDT) Date: Fri, 30 Sep 2005 13:47:36 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Poul-Henning Kamp In-Reply-To: <32170.1128084231@critter.freebsd.dk> Message-ID: <20050930134526.R71864@fledge.watson.org> References: <32170.1128084231@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, Gleb Smirnoff , net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 12:47:37 -0000 On Fri, 30 Sep 2005, Poul-Henning Kamp wrote: > I still think we should stop having this network-centric view of polling > and implement _real_ *device* polling, so that other device types can > use it as well. While I agree that we should offer polling to non-network device drivers also, I think it's worth observing that the network awareness of our current polling code has some interesting advantages. For one thing, the framework itself is aware of the notion of batching and moderating the workload as it is aware of the number of mbufs being processes, and knows to bound the workload, etc. We'll need to revisit many of the ideas in the current polling implementation (designed largely around 4.x operating assumptions) anyway, but I think it's important we understand some of the implicit design benefits that are present in the current system as well... Robert N M Watson From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 12:57:11 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7AC0816A41F; Fri, 30 Sep 2005 12:57:11 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 22D2643D48; Fri, 30 Sep 2005 12:57:11 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 8593FBC96; Fri, 30 Sep 2005 12:57:09 +0000 (UTC) To: Robert Watson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 30 Sep 2005 13:47:36 BST." <20050930134526.R71864@fledge.watson.org> Date: Fri, 30 Sep 2005 14:57:08 +0200 Message-ID: <32249.1128085028@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.org, Gleb Smirnoff , net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 12:57:11 -0000 In message <20050930134526.R71864@fledge.watson.org>, Robert Watson writes: >On Fri, 30 Sep 2005, Poul-Henning Kamp wrote: > >> I still think we should stop having this network-centric view of polling >> and implement _real_ *device* polling, so that other device types can >> use it as well. > >While I agree that we should offer polling to non-network device drivers >also, I think it's worth observing that the network awareness of our >current polling code has some interesting advantages. [...] None of which could not be implemented on top of a general polling facility, and some of which makes polling unusable with high end networking hardware like a 8x1Gige card where all ports are handled by the same interrupt. Anyway, I just wanted to make the point, not start a long discussion. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 14:06:07 2005 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8FDCE16A427; Fri, 30 Sep 2005 14:06:07 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2397943D48; Fri, 30 Sep 2005 14:06:05 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.41.233] (Not Verified[10.50.41.233]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Fri, 30 Sep 2005 10:22:03 -0400 From: John Baldwin To: freebsd-arch@freebsd.org Date: Fri, 30 Sep 2005 10:07:09 -0400 User-Agent: KMail/1.8 References: <20050930124000.GA45345@cell.sick.ru> In-Reply-To: <20050930124000.GA45345@cell.sick.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200509301007.10494.jhb@FreeBSD.org> Cc: Gleb Smirnoff , net@freebsd.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 14:06:07 -0000 On Friday 30 September 2005 08:40 am, Gleb Smirnoff wrote: > [please, follow-up on net@ only] > > Colleagues, > > here are some patches for review. > > Problems addressed: > > 1) When Giant was removed from polling a problem was introduced. The id= le > poll feature was broken. The idle poll thread can enter polling handler on > one interface and put to sleep for a long time, until CPU resources found. > During this time no traffic is received on interface. Well, this is what > idle thread is supposed to do. Why didn't this happen with Giant? Because > idle poll entered poll handler holding Giant, and other threads (in > particular netisr poll) contested on Giant and propagated their priority = to > idle poll. Well, this is a hack, but idle poll significantly improves > polling performance on an idle box, that's why I won't axe it but try to > fix it. > > To address the problem we need to use the same technique as before, but > use poll_mtx instead of Giant. However, this will resurrect LORs, that we= re > fixed with Giant removal. The alternative lock path happens, when driver > decides to deregister from polling itself. The LOR is fixed by further > changes. See 3). > > 2) Drivers indicate their ability to do polling(4) with IFCAP_POLLING > flag int if_capabilites field. Setting the flag in if_capenable should > register interface with polling and disable interrupts. However, the > if_flags is also abused with IFF_POLLING flag. The aim is to remove > IFF_POLLING flag. > > 3) The polling is switched on and off not functionally. That is, when y= ou > say 'sysctl kern.polling.enable=3D1' or 'ifconfig fxp0 -polling', the pol= ling > is switched on/off not immediately but on next tick or next interrupt. Th= is > non-functional approach leads to a lot of ambiguouties in code, makes it > harder to understand and maintain. It also exposes race conditions. > > The attached patch removes: > - IFF_POLLING flag. > - Use of if_flags, if_drv_flags, if_capenable from kern_poll.c. > All current accesses to these fields are not locked and polling > shouldn't look there. > - poll in trap feature. Sorry, we can't acquire mutexes in trap(). Anyo= ne > used it, anyway? > - POLL_DEREGISTER command. No hacks. Everything is done functionally via > ioctl(). > > The new world order for driver is the following: > > 1) Declare IFCAP_POLLING in if_capabilities on attach. Do not touch > if_capenable. 2) in ioctl method, in SIOCSIFCAP case the driver should: > - call ether_poll_[de]register > - if no error, set the IFCAP_POLLING flag in if_capenable > - obtain driver lock > - [dis/en]able interrupts > - drop driver lock > 3) In poll method, check IFF_DRV_RUNNING flag after obtaining driver lo= ck > 4) In interrupt handler check IFCAP_POLLING flag in if_capenable. If > present, then return. This is important to protect from spurious > interrupts. 5) In device detach method, call ether_poll_deregister() befo= re > obtaining driver lock. =46rom my limited experience with locking various NIC drivers, I like this= =20 change. I think it is much better to tweak the polling state in the ioctl(= )=20 handler rather than in the poll handler. =2D-=20 John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" =3D http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 16:03:06 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C36616A41F; Fri, 30 Sep 2005 16:03:06 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F70F43D53; Fri, 30 Sep 2005 16:03:04 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.3/8.13.3) with ESMTP id j8UG33PJ048028 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Sep 2005 20:03:03 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.3/8.13.1/Submit) id j8UG33BF048027; Fri, 30 Sep 2005 20:03:03 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 30 Sep 2005 20:03:02 +0400 From: Gleb Smirnoff To: arch@FreeBSD.org, net@FreeBSD.org Message-ID: <20050930160302.GJ45345@cell.sick.ru> References: <20050930124000.GA45345@cell.sick.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="5/uDoXvLw7AC5HRs" Content-Disposition: inline In-Reply-To: <20050930124000.GA45345@cell.sick.ru> User-Agent: Mutt/1.5.6i Cc: Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 16:03:06 -0000 --5/uDoXvLw7AC5HRs Content-Type: text/plain; charset=koi8-r Content-Disposition: inline On Fri, Sep 30, 2005 at 04:40:00PM +0400, Gleb Smirnoff wrote: T> [please, follow-up on net@ only] T> T> Colleagues, T> T> here are some patches for review. I have some changes to patch after last compile, and haven't tested them befire sending patch. Here is an updated one. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE --5/uDoXvLw7AC5HRs Content-Type: text/plain; charset=koi8-r Content-Disposition: attachment; filename="newpoll.diff" Index: amd64/amd64/trap.c =================================================================== RCS file: /home/ncvs/src/sys/amd64/amd64/trap.c,v retrieving revision 1.293 diff -u -r1.293 trap.c --- amd64/amd64/trap.c 28 Sep 2005 07:03:02 -0000 1.293 +++ amd64/amd64/trap.c 29 Sep 2005 09:41:32 -0000 @@ -146,11 +146,6 @@ extern char *syscallnames[]; #endif -#ifdef DEVICE_POLLING -extern u_int32_t poll_in_trap; -extern int ether_poll(int count); -#endif /* DEVICE_POLLING */ - /* * Exception, fault, and trap interface to the FreeBSD kernel. * This common code is called from assembly language IDT gate entry @@ -241,11 +236,6 @@ trap_fatal(&frame, frame.tf_addr); } -#ifdef DEVICE_POLLING - if (poll_in_trap) - ether_poll(poll_in_trap); -#endif /* DEVICE_POLLING */ - if (ISPL(frame.tf_cs) == SEL_UPL) { /* user trap */ Index: dev/em/if_em.c =================================================================== RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.73 diff -u -r1.73 if_em.c --- dev/em/if_em.c 29 Sep 2005 13:23:34 -0000 1.73 +++ dev/em/if_em.c 30 Sep 2005 15:17:05 -0000 @@ -197,6 +197,9 @@ static void em_add_int_delay_sysctl(struct adapter *, const char *, const char *, struct em_int_delay_info *, int, int); +#ifdef DEVICE_POLLING +static poll_handler_t em_poll; +#endif /********************************************************************* * FreeBSD Device Interface Entry Points @@ -526,6 +529,11 @@ INIT_DEBUGOUT("em_detach: begin"); +#ifdef DEVICE_POLLING + if (ifp->if_capenable & IFCAP_POLLING) + ether_poll_deregister(ifp); +#endif + EM_LOCK(adapter); adapter->in_detach = 1; em_stop(adapter); @@ -717,7 +725,7 @@ em_initialize_receive_unit(adapter); } #ifdef DEVICE_POLLING - if (!(ifp->if_flags & IFF_POLLING)) + if (!(ifp->if_capenable & IFCAP_POLLING)) #endif em_enable_intr(adapter); EM_UNLOCK(adapter); @@ -732,8 +740,26 @@ IOCTL_DEBUGOUT("ioctl rcv'd: SIOCSIFCAP (Set Capabilities)"); reinit = 0; mask = ifr->ifr_reqcap ^ ifp->if_capenable; - if (mask & IFCAP_POLLING) - ifp->if_capenable ^= IFCAP_POLLING; +#ifdef DEVICE_POLLING + if (mask & IFCAP_POLLING) { + if (ifr->ifr_reqcap & IFCAP_POLLING) { + error = ether_poll_register(em_poll, ifp); + if (error) + return(error); + EM_LOCK(adapter); + em_disable_intr(adapter); + ifp->if_capenable |= IFCAP_POLLING; + EM_UNLOCK(adapter); + } else { + error = ether_poll_deregister(ifp); + /* Enable interrupt even in error case */ + EM_LOCK(adapter); + em_enable_intr(adapter); + ifp->if_capenable &= ~IFCAP_POLLING; + EM_UNLOCK(adapter); + } + } +#endif if (mask & IFCAP_HWCSUM) { ifp->if_capenable ^= IFCAP_HWCSUM; reinit = 1; @@ -895,7 +921,7 @@ * Only enable interrupts if we are not polling, make sure * they are off otherwise. */ - if (ifp->if_flags & IFF_POLLING) + if (ifp->if_capenable & IFCAP_POLLING) em_disable_intr(adapter); else #endif /* DEVICE_POLLING */ @@ -920,8 +946,6 @@ #ifdef DEVICE_POLLING -static poll_handler_t em_poll; - static void em_poll_locked(struct ifnet *ifp, enum poll_cmd cmd, int count) { @@ -930,14 +954,6 @@ mtx_assert(&adapter->mtx, MA_OWNED); - if (!(ifp->if_capenable & IFCAP_POLLING)) { - ether_poll_deregister(ifp); - cmd = POLL_DEREGISTER; - } - if (cmd == POLL_DEREGISTER) { /* final call, enable interrupts */ - em_enable_intr(adapter); - return; - } if (cmd == POLL_AND_CHECK_STATUS) { reg_icr = E1000_READ_REG(&adapter->hw, ICR); if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC)) { @@ -948,13 +964,10 @@ callout_reset(&adapter->timer, hz, em_local_timer, adapter); } } - if (ifp->if_drv_flags & IFF_DRV_RUNNING) { - em_process_receive_interrupts(adapter, count); - em_clean_transmit_interrupts(adapter); - } + em_process_receive_interrupts(adapter, count); + em_clean_transmit_interrupts(adapter); - if (ifp->if_drv_flags & IFF_DRV_RUNNING && - !IFQ_DRV_IS_EMPTY(&ifp->if_snd)) + if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) em_start_locked(ifp); } @@ -964,7 +977,8 @@ struct adapter *adapter = ifp->if_softc; EM_LOCK(adapter); - em_poll_locked(ifp, cmd, count); + if (ifp->if_drv_flags & IFF_DRV_RUNNING) + em_poll_locked(ifp, cmd, count); EM_UNLOCK(adapter); } #endif /* DEVICE_POLLING */ @@ -987,18 +1001,10 @@ ifp = adapter->ifp; #ifdef DEVICE_POLLING - if (ifp->if_flags & IFF_POLLING) { + if (ifp->if_capenable & IFCAP_POLLING) { EM_UNLOCK(adapter); return; } - - if ((ifp->if_capenable & IFCAP_POLLING) && - ether_poll_register(em_poll, ifp)) { - em_disable_intr(adapter); - em_poll_locked(ifp, 0, 1); - EM_UNLOCK(adapter); - return; - } #endif /* DEVICE_POLLING */ reg_icr = E1000_READ_REG(&adapter->hw, ICR); @@ -1718,9 +1724,7 @@ mtx_assert(&adapter->mtx, MA_OWNED); INIT_DEBUGOUT("em_stop: begin"); -#ifdef DEVICE_POLLING - ether_poll_deregister(ifp); -#endif + em_disable_intr(adapter); em_reset_hw(&adapter->hw); callout_stop(&adapter->timer); @@ -1976,7 +1980,6 @@ #ifdef DEVICE_POLLING ifp->if_capabilities |= IFCAP_POLLING; - ifp->if_capenable |= IFCAP_POLLING; #endif /* Index: dev/fxp/if_fxp.c =================================================================== RCS file: /home/ncvs/src/sys/dev/fxp/if_fxp.c,v retrieving revision 1.247 diff -u -r1.247 if_fxp.c --- dev/fxp/if_fxp.c 27 Sep 2005 09:01:10 -0000 1.247 +++ dev/fxp/if_fxp.c 30 Sep 2005 15:37:08 -0000 @@ -773,7 +773,6 @@ #ifdef DEVICE_POLLING /* Inform the world we support polling. */ ifp->if_capabilities |= IFCAP_POLLING; - ifp->if_capenable |= IFCAP_POLLING; #endif /* @@ -891,6 +890,11 @@ { struct fxp_softc *sc = device_get_softc(dev); +#ifdef DEVICE_POLLING + if (sc->ifp->if_capenable & IFCAP_POLLING) + ether_poll_deregister(sc->ifp); +#endif + FXP_LOCK(sc); sc->suspended = 1; /* Do same thing as we do for suspend */ /* @@ -1448,15 +1452,11 @@ uint8_t statack; FXP_LOCK(sc); - if (!(ifp->if_capenable & IFCAP_POLLING)) { - ether_poll_deregister(ifp); - cmd = POLL_DEREGISTER; - } - if (cmd == POLL_DEREGISTER) { /* final call, enable interrupts */ - CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, 0); + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { FXP_UNLOCK(sc); return; } + statack = FXP_SCB_STATACK_CXTNO | FXP_SCB_STATACK_CNA | FXP_SCB_STATACK_FR; if (cmd == POLL_AND_CHECK_STATUS) { @@ -1495,18 +1495,10 @@ } #ifdef DEVICE_POLLING - if (ifp->if_flags & IFF_POLLING) { + if (ifp->if_capenable & IFCAP_POLLING) { FXP_UNLOCK(sc); return; } - if ((ifp->if_capenable & IFCAP_POLLING) && - ether_poll_register(fxp_poll, ifp)) { - /* disable interrupts */ - CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, FXP_SCB_INTR_DISABLE); - FXP_UNLOCK(sc); - fxp_poll(ifp, 0, 1); - return; - } #endif while ((statack = CSR_READ_1(sc, FXP_CSR_SCB_STATACK)) != 0) { /* @@ -1837,9 +1829,6 @@ ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); ifp->if_timer = 0; -#ifdef DEVICE_POLLING - ether_poll_deregister(ifp); -#endif /* * Cancel stats updater. */ @@ -2163,7 +2152,7 @@ * ... but only do that if we are not polling. And because (presumably) * the default is interrupts on, we need to disable them explicitly! */ - if ( ifp->if_flags & IFF_POLLING ) + if (ifp->if_capenable & IFCAP_POLLING ) CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, FXP_SCB_INTR_DISABLE); else #endif /* DEVICE_POLLING */ @@ -2418,11 +2407,30 @@ break; case SIOCSIFCAP: - FXP_LOCK(sc); mask = ifp->if_capenable ^ ifr->ifr_reqcap; - if (mask & IFCAP_POLLING) - ifp->if_capenable ^= IFCAP_POLLING; +#ifdef DEVICE_POLLING + if (mask & IFCAP_POLLING) { + if (ifr->ifr_reqcap & IFCAP_POLLING) { + error = ether_poll_register(fxp_poll, ifp); + if (error) + return(error); + FXP_LOCK(sc); + CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, + FXP_SCB_INTR_DISABLE); + ifp->if_capenable |= IFCAP_POLLING; + FXP_UNLOCK(sc); + } else { + error = ether_poll_deregister(ifp); + /* Enable interrupts in any case */ + FXP_LOCK(sc); + CSR_WRITE_1(sc, FXP_CSR_SCB_INTRCNTL, 0); + ifp->if_capenable &= ~IFCAP_POLLING; + FXP_UNLOCK(sc); + } + } +#endif if (mask & IFCAP_VLAN_MTU) { + FXP_LOCK(sc); ifp->if_capenable ^= IFCAP_VLAN_MTU; if (sc->revision != FXP_REV_82557) flag = FXP_FLAG_LONG_PKT_EN; @@ -2431,8 +2439,8 @@ sc->flags ^= flag; if (ifp->if_flags & IFF_UP) fxp_init_body(sc); + FXP_UNLOCK(sc); } - FXP_UNLOCK(sc); break; default: Index: i386/i386/trap.c =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.280 diff -u -r1.280 trap.c --- i386/i386/trap.c 28 Sep 2005 07:03:03 -0000 1.280 +++ i386/i386/trap.c 29 Sep 2005 09:42:21 -0000 @@ -160,11 +160,6 @@ extern char *syscallnames[]; #endif -#ifdef DEVICE_POLLING -extern u_int32_t poll_in_trap; -extern int ether_poll(int count); -#endif /* DEVICE_POLLING */ - /* * Exception, fault, and trap interface to the FreeBSD kernel. * This common code is called from assembly language IDT gate entry @@ -272,11 +267,6 @@ trap_fatal(&frame, eva); } -#ifdef DEVICE_POLLING - if (poll_in_trap) - ether_poll(poll_in_trap); -#endif /* DEVICE_POLLING */ - if ((ISPL(frame.tf_cs) == SEL_UPL) || ((frame.tf_eflags & PSL_VM) && !(PCPU_GET(curpcb)->pcb_flags & PCB_VM86CALL))) { Index: kern/kern_poll.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_poll.c,v retrieving revision 1.22 diff -u -r1.22 kern_poll.c --- kern/kern_poll.c 6 Sep 2005 11:09:18 -0000 1.22 +++ kern/kern_poll.c 30 Sep 2005 14:52:25 -0000 @@ -32,6 +32,7 @@ #include #include #include /* needed by net/if.h */ +#include #include #include @@ -44,14 +45,15 @@ static void netisr_poll(void); /* the two netisr handlers */ static void netisr_pollmore(void); +static int poll_switch(SYSCTL_HANDLER_ARGS); void hardclock_device_poll(void); /* hook from hardclock */ -void ether_poll(int); /* polling while in trap */ +void ether_poll(int); /* polling in idle loop */ /* * Polling support for [network] device drivers. * - * Drivers which support this feature try to register with the + * Drivers which support this feature can register with the * polling code. * * If registration is successful, the driver must disable interrupts, @@ -64,10 +66,6 @@ * POLL_AND_CHECK_STATUS: as above, plus check status registers or do * other more expensive operations. This command is issued periodically * but less frequently than POLL_ONLY. - * POLL_DEREGISTER: deregister and return to interrupt mode. - * - * The first two commands are only issued if the interface is marked as - * 'IFF_UP and IFF_DRV_RUNNING', the last one only if IFF_DRV_RUNNING is set. * * The count limit specifies how much work the handler can do during the * call -- typically this is the number of packets to be received, or @@ -75,11 +73,9 @@ * as the max time spent in the function grows roughly linearly with the * count). * - * Deregistration can be requested by the driver itself (typically in the - * *_stop() routine), or by the polling code, by invoking the handler. - * - * Polling can be globally enabled or disabled with the sysctl variable - * kern.polling.enable (default is 0, disabled) + * Polling is enabled and disabled via setting IFCAP_POLLING flag on + * the interface. The driver ioctl handler should register interface + * with polling and disable interrupts, if registration was successful. * * A second variable controls the sharing of CPU between polling/kernel * network processing, and other activities (typically userlevel tasks): @@ -91,7 +87,7 @@ * The following constraints hold * * 1 <= poll_each_burst <= poll_burst <= poll_burst_max - * 0 <= poll_in_trap <= poll_each_burst + * 0 <= poll_each_burst * MIN_POLL_BURST_MAX <= poll_burst_max <= MAX_POLL_BURST_MAX */ @@ -117,10 +113,6 @@ SYSCTL_UINT(_kern_polling, OID_AUTO, idle_poll, CTLFLAG_RW, &poll_in_idle_loop, 0, "Enable device polling in idle loop"); -u_int32_t poll_in_trap; /* used in trap.c */ -SYSCTL_UINT(_kern_polling, OID_AUTO, poll_in_trap, CTLFLAG_RW, - &poll_in_trap, 0, "Poll burst size during a trap"); - static u_int32_t user_frac = 50; SYSCTL_UINT(_kern_polling, OID_AUTO, user_frac, CTLFLAG_RW, &user_frac, 0, "Desired user fraction of cpu time"); @@ -149,9 +141,9 @@ SYSCTL_UINT(_kern_polling, OID_AUTO, handlers, CTLFLAG_RD, &poll_handlers, 0, "Number of registered poll handlers"); -static int polling = 0; /* global polling enable */ -SYSCTL_UINT(_kern_polling, OID_AUTO, enable, CTLFLAG_RW, - &polling, 0, "Polling enabled"); +static int polling = 0; +SYSCTL_PROC(_kern_polling, OID_AUTO, enable, CTLTYPE_UINT | CTLFLAG_RW, + 0, sizeof(int), poll_switch, "I", "Switch polling for all interfaces"); static u_int32_t phase; SYSCTL_UINT(_kern_polling, OID_AUTO, phase, CTLFLAG_RW, @@ -174,23 +166,9 @@ struct pollrec { poll_handler_t *handler; struct ifnet *ifp; - /* - * Flags of polling record (protected by poll_mtx). - * PRF_RUNNING means that the handler is now executing. - * PRF_LEAVING means that the handler is now deregistering. - */ -#define PRF_RUNNING 0x1 -#define PRF_LEAVING 0x2 - uint32_t flags; }; static struct pollrec pr[POLL_LIST_LEN]; - -#define PR_VALID(i) (pr[(i)].handler != NULL && \ - !(pr[(i)].flags & (PRF_RUNNING|PRF_LEAVING)) && \ - (pr[(i)].ifp->if_drv_flags & IFF_DRV_RUNNING) &&\ - (pr[(i)].ifp->if_flags & IFF_UP)) - static struct mtx poll_mtx; static void @@ -258,30 +236,24 @@ } /* - * ether_poll is called from the idle loop or from the trap handler. + * ether_poll is called from the idle loop. */ void ether_poll(int count) { int i; + NET_LOCK_GIANT(); mtx_lock(&poll_mtx); if (count > poll_each_burst) count = poll_each_burst; - for (i = 0 ; i < poll_handlers ; i++) { - if (PR_VALID(i)) { - pr[i].flags |= PRF_RUNNING; - mtx_unlock(&poll_mtx); - NET_LOCK_GIANT(); - pr[i].handler(pr[i].ifp, POLL_ONLY, count); - NET_UNLOCK_GIANT(); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_RUNNING; - } - } + for (i = 0 ; i < poll_handlers ; i++) + pr[i].handler(pr[i].ifp, POLL_ONLY, count); + mtx_unlock(&poll_mtx); + NET_UNLOCK_GIANT(); } /* @@ -403,60 +375,29 @@ residual_burst : poll_each_burst; residual_burst -= cycles; - if (polling) { - for (i = 0 ; i < poll_handlers ; i++) { - if (PR_VALID(i)) { - pr[i].flags |= PRF_RUNNING; - mtx_unlock(&poll_mtx); - pr[i].handler(pr[i].ifp, arg, cycles); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_RUNNING; - } - } - } else { /* unregister */ - for (i = 0 ; i < poll_handlers ; i++) { - if (pr[i].handler != NULL && - pr[i].ifp->if_drv_flags & IFF_DRV_RUNNING) { - pr[i].ifp->if_flags &= ~IFF_POLLING; - pr[i].flags |= PRF_LEAVING; - mtx_unlock(&poll_mtx); - pr[i].handler(pr[i].ifp, POLL_DEREGISTER, 1); - mtx_lock(&poll_mtx); - pr[i].flags &= ~PRF_LEAVING; - } - pr[i].handler = NULL; - } - residual_burst = 0; - poll_handlers = 0; - } + for (i = 0 ; i < poll_handlers ; i++) + pr[i].handler(pr[i].ifp, arg, cycles); phase = 4; mtx_unlock(&poll_mtx); } /* - * Try to register routine for polling. Returns 1 if successful - * (and polling should be enabled), 0 otherwise. + * Try to register routine for polling. Returns 0 if successful + * (and polling should be enabled), error code otherwise. * A device is not supposed to register itself multiple times. * - * This is called from within the *_intr() functions, so we do not need - * further ifnet locking. + * This is called from within the *_ioctl() functions. */ int ether_poll_register(poll_handler_t *h, struct ifnet *ifp) { int i; - NET_ASSERT_GIANT(); + KASSERT(h != NULL, ("%s: handler is NULL", __func__)); + KASSERT(ifp != NULL, ("%s: ifp is NULL", __func__)); - if (polling == 0) /* polling disabled, cannot register */ - return 0; - if (h == NULL || ifp == NULL) /* bad arguments */ - return 0; - if ( !(ifp->if_flags & IFF_UP) ) /* must be up */ - return 0; - if (ifp->if_flags & IFF_POLLING) /* already polling */ - return 0; + NET_ASSERT_GIANT(); mtx_lock(&poll_mtx); if (poll_handlers >= POLL_LIST_LEN) { @@ -474,7 +415,7 @@ verbose--; } mtx_unlock(&poll_mtx); - return 0; /* no polling for you */ + return (ENOMEM); /* no polling for you */ } for (i = 0 ; i < poll_handlers ; i++) @@ -482,45 +423,39 @@ mtx_unlock(&poll_mtx); log(LOG_DEBUG, "ether_poll_register: %s: handler" " already registered\n", ifp->if_xname); - return (0); + return (EEXIST); } pr[poll_handlers].handler = h; pr[poll_handlers].ifp = ifp; poll_handlers++; - ifp->if_flags |= IFF_POLLING; mtx_unlock(&poll_mtx); if (idlepoll_sleeping) wakeup(&idlepoll_sleeping); - return 1; /* polling enabled in next call */ + return (0); } /* - * Remove interface from the polling list. Normally called by *_stop(). - * It is not an error to call it with IFF_POLLING clear, the call is - * sufficiently rare to be preferable to save the space for the extra - * test in each driver in exchange of one additional function call. + * Remove interface from the polling list. Called from *_ioctl(), too. */ int ether_poll_deregister(struct ifnet *ifp) { int i; - NET_ASSERT_GIANT(); + KASSERT(ifp != NULL, ("%s: ifp is NULL", __func__)); - if ( !ifp || !(ifp->if_flags & IFF_POLLING) ) { - return 0; - } + NET_ASSERT_GIANT(); mtx_lock(&poll_mtx); + for (i = 0 ; i < poll_handlers ; i++) if (pr[i].ifp == ifp) /* found it */ break; - ifp->if_flags &= ~IFF_POLLING; /* found or not... */ if (i == poll_handlers) { - mtx_unlock(&poll_mtx); log(LOG_DEBUG, "ether_poll_deregister: %s: not found!\n", ifp->if_xname); - return (0); + mtx_unlock(&poll_mtx); + return (ENOENT); } poll_handlers--; if (i < poll_handlers) { /* Last entry replaces this one. */ @@ -528,7 +463,60 @@ pr[i].ifp = pr[poll_handlers].ifp; } mtx_unlock(&poll_mtx); - return (1); + return (0); +} + +/* + * Legacy interface for turning polling on all interfaces at one time. + */ +static int +poll_switch(SYSCTL_HANDLER_ARGS) +{ + struct ifnet *ifp; + int error; + int val; + + mtx_lock(&poll_mtx); + val = polling; + mtx_unlock(&poll_mtx); + + error = sysctl_handle_int(oidp, &val, sizeof(int), req); + if (error || !req->newptr ) + return (error); + + if (val == polling) + return (0); + + if (val < 0 || val > 1) + return (EINVAL); + + mtx_lock(&poll_mtx); + polling = val; + mtx_unlock(&poll_mtx); + + NET_LOCK_GIANT(); + IFNET_RLOCK(); + TAILQ_FOREACH(ifp, &ifnet, if_link) { + if (ifp->if_capabilities & IFCAP_POLLING) { + struct ifreq ifr; + + if (val == 1) + ifr.ifr_reqcap = + ifp->if_capenable | IFCAP_POLLING; + else + ifr.ifr_reqcap = + ifp->if_capenable & ~IFCAP_POLLING; + IFF_LOCKGIANT(ifp); /* LOR here */ + (void) (*ifp->if_ioctl)(ifp, SIOCSIFCAP, (caddr_t)&ifr); + IFF_UNLOCKGIANT(ifp); + } + } + IFNET_RUNLOCK(); + NET_UNLOCK_GIANT(); + + log(LOG_ERR, "kern.polling.enable is deprecated. Use ifconfig(8)"); + + return (0); } static void Index: net/if.h =================================================================== RCS file: /home/ncvs/src/sys/net/if.h,v retrieving revision 1.98 diff -u -r1.98 if.h --- net/if.h 9 Aug 2005 12:56:20 -0000 1.98 +++ net/if.h 29 Sep 2005 11:25:05 -0000 @@ -148,7 +148,7 @@ #define IFF_LINK2 0x4000 /* per link layer defined bit */ #define IFF_ALTPHYS IFF_LINK2 /* use alternate physical connection */ #define IFF_MULTICAST 0x8000 /* (i) supports multicast */ -#define IFF_POLLING 0x10000 /* (n) Interface is in polling mode. */ +/* 0x10000 */ #define IFF_PPROMISC 0x20000 /* (n) user-requested promisc mode */ #define IFF_MONITOR 0x40000 /* (n) user-requested monitor mode */ #define IFF_STATICARP 0x80000 /* (n) static ARP */ @@ -166,8 +166,7 @@ /* flags set internally only: */ #define IFF_CANTCHANGE \ (IFF_BROADCAST|IFF_POINTOPOINT|IFF_DRV_RUNNING|IFF_DRV_OACTIVE|\ - IFF_SIMPLEX|IFF_MULTICAST|IFF_ALLMULTI|IFF_SMART|IFF_PROMISC|\ - IFF_POLLING) + IFF_SIMPLEX|IFF_MULTICAST|IFF_ALLMULTI|IFF_SMART|IFF_PROMISC) /* * Values for if_link_state. Index: net/if_var.h =================================================================== RCS file: /home/ncvs/src/sys/net/if_var.h,v retrieving revision 1.102 diff -u -r1.102 if_var.h --- net/if_var.h 9 Aug 2005 10:16:17 -0000 1.102 +++ net/if_var.h 29 Sep 2005 10:19:43 -0000 @@ -660,7 +660,7 @@ LLADDR((struct sockaddr_dl *) ifaddr_byindex((ifp)->if_index)->ifa_addr) #ifdef DEVICE_POLLING -enum poll_cmd { POLL_ONLY, POLL_AND_CHECK_STATUS, POLL_DEREGISTER }; +enum poll_cmd { POLL_ONLY, POLL_AND_CHECK_STATUS }; typedef void poll_handler_t(struct ifnet *ifp, enum poll_cmd cmd, int count); int ether_poll_register(poll_handler_t *h, struct ifnet *ifp); --5/uDoXvLw7AC5HRs-- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 18:13:41 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ECDAA16A41F; Fri, 30 Sep 2005 18:13:41 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3012043D49; Fri, 30 Sep 2005 18:13:41 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 8273B52CD0; Fri, 30 Sep 2005 20:13:39 +0200 (CEST) Received: from localhost (dlr254.neoplus.adsl.tpnet.pl [83.24.47.254]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 21B2A52CA8; Fri, 30 Sep 2005 20:13:31 +0200 (CEST) Date: Fri, 30 Sep 2005 20:13:22 +0200 From: Pawel Jakub Dawidek To: Gleb Smirnoff Message-ID: <20050930181322.GB1768@garage.freebsd.pl> References: <20050930124000.GA45345@cell.sick.ru> <20050930160302.GJ45345@cell.sick.ru> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VbJkn9YxBvnuCH5J" Content-Disposition: inline In-Reply-To: <20050930160302.GJ45345@cell.sick.ru> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=BAYES_00,RCVD_IN_NJABL_DUL, RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 18:13:42 -0000 --VbJkn9YxBvnuCH5J Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 30, 2005 at 08:03:02PM +0400, Gleb Smirnoff wrote: +> On Fri, Sep 30, 2005 at 04:40:00PM +0400, Gleb Smirnoff wrote: +> T> [please, follow-up on net@ only] +> T>=20 +> T> Colleagues, +> T>=20 +> T> here are some patches for review. +>=20 +> I have some changes to patch after last compile, and haven't tested them= =20 +> befire sending patch. Here is an updated one. BTW. Not compiling in DEVICE_POLLING has any advantages except few bytes smaller kernel? I wonder if we could drop yet another kernel option and just set kern.poll.enable to 0 by default. If adding DEVICE_POLLING to the kernel doesn't cost additional locking, etc. in network data flow paths (which could lead to performance impact in some specific environments) can we just compile the code in always? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --VbJkn9YxBvnuCH5J Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDPYBCForvXbEpPzQRAu1KAKC82f6oU99yWlmLAzPKd2mBdTjWugCfU6Tn 12z9G0H9OQwENzjjVXQXMKA= =dcOn -----END PGP SIGNATURE----- --VbJkn9YxBvnuCH5J-- From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 18:23:28 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADA2616A41F; Fri, 30 Sep 2005 18:23:28 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 02E3243D48; Fri, 30 Sep 2005 18:23:27 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.3/8.13.3) with ESMTP id j8UINQ2m049001 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Sep 2005 22:23:26 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.3/8.13.1/Submit) id j8UINQWT049000; Fri, 30 Sep 2005 22:23:26 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 30 Sep 2005 22:23:25 +0400 From: Gleb Smirnoff To: Pawel Jakub Dawidek Message-ID: <20050930182325.GO45345@cell.sick.ru> References: <20050930124000.GA45345@cell.sick.ru> <20050930160302.GJ45345@cell.sick.ru> <20050930181322.GB1768@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20050930181322.GB1768@garage.freebsd.pl> User-Agent: Mutt/1.5.6i Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 18:23:28 -0000 On Fri, Sep 30, 2005 at 08:13:22PM +0200, Pawel Jakub Dawidek wrote: P> On Fri, Sep 30, 2005 at 08:03:02PM +0400, Gleb Smirnoff wrote: P> +> On Fri, Sep 30, 2005 at 04:40:00PM +0400, Gleb Smirnoff wrote: P> +> T> [please, follow-up on net@ only] P> +> T> P> +> T> Colleagues, P> +> T> P> +> T> here are some patches for review. P> +> P> +> I have some changes to patch after last compile, and haven't tested them P> +> befire sending patch. Here is an updated one. P> P> BTW. Not compiling in DEVICE_POLLING has any advantages except few bytes P> smaller kernel? P> I wonder if we could drop yet another kernel option and just set P> kern.poll.enable to 0 by default. P> If adding DEVICE_POLLING to the kernel doesn't cost additional locking, etc. P> in network data flow paths (which could lead to performance impact in some P> specific environments) can we just compile the code in always? It adds a stub function call every tick. The function returns almost immediately if no interfaces do polling. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 18:29:41 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A0AE316A41F for ; Fri, 30 Sep 2005 18:29:41 +0000 (GMT) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 35BEA43D4C for ; Fri, 30 Sep 2005 18:29:39 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 4434 invoked from network); 30 Sep 2005 18:01:56 -0000 Received: from unknown (HELO freebsd.org) ([62.48.0.53]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 30 Sep 2005 18:01:56 -0000 Message-ID: <433D8417.D4666378@freebsd.org> Date: Fri, 30 Sep 2005 20:29:43 +0200 From: Andre Oppermann X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Gleb Smirnoff References: <20050930124000.GA45345@cell.sick.ru> <20050930160302.GJ45345@cell.sick.ru> <20050930181322.GB1768@garage.freebsd.pl> <20050930182325.GO45345@cell.sick.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, Pawel Jakub Dawidek , net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 18:29:41 -0000 Gleb Smirnoff wrote: > > On Fri, Sep 30, 2005 at 08:13:22PM +0200, Pawel Jakub Dawidek wrote: > P> On Fri, Sep 30, 2005 at 08:03:02PM +0400, Gleb Smirnoff wrote: > P> +> On Fri, Sep 30, 2005 at 04:40:00PM +0400, Gleb Smirnoff wrote: > P> +> T> [please, follow-up on net@ only] > P> +> T> > P> +> T> Colleagues, > P> +> T> > P> +> T> here are some patches for review. > P> +> > P> +> I have some changes to patch after last compile, and haven't tested them > P> +> befire sending patch. Here is an updated one. > P> > P> BTW. Not compiling in DEVICE_POLLING has any advantages except few bytes > P> smaller kernel? > P> I wonder if we could drop yet another kernel option and just set > P> kern.poll.enable to 0 by default. > P> If adding DEVICE_POLLING to the kernel doesn't cost additional locking, etc. > P> in network data flow paths (which could lead to performance impact in some > P> specific environments) can we just compile the code in always? > > It adds a stub function call every tick. The function returns almost > immediately if no interfaces do polling. If it does a FOREACH(interface) then it should stay as a kernel option. -- Andre From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 19:33:34 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3C27016A41F; Fri, 30 Sep 2005 19:33:34 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id E85AD43D48; Fri, 30 Sep 2005 19:33:33 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.11) with ESMTP id j8UJXXfT048897; Fri, 30 Sep 2005 12:33:33 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j8UJXX1Z048896; Fri, 30 Sep 2005 12:33:33 -0700 (PDT) (envelope-from rizzo) Date: Fri, 30 Sep 2005 12:33:33 -0700 From: Luigi Rizzo To: Andre Oppermann Message-ID: <20050930123333.A48015@xorpc.icir.org> References: <20050930124000.GA45345@cell.sick.ru> <20050930160302.GJ45345@cell.sick.ru> <20050930181322.GB1768@garage.freebsd.pl> <20050930182325.GO45345@cell.sick.ru> <433D8417.D4666378@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <433D8417.D4666378@freebsd.org>; from andre@freebsd.org on Fri, Sep 30, 2005 at 08:29:43PM +0200 Cc: arch@freebsd.org, Gleb Smirnoff , Pawel Jakub Dawidek , net@freebsd.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 19:33:34 -0000 On Fri, Sep 30, 2005 at 08:29:43PM +0200, Andre Oppermann wrote: ... > > It adds a stub function call every tick. The function returns almost > > immediately if no interfaces do polling. > > If it does a FOREACH(interface) then it should stay as a kernel option. this wasn't the case when i first wrote it - the list of interfaces actually using polling was stored into an array and the count was in a variable, so the loop was something like for (i=0; i < actively_polling_interfaces; i++) foo[i]->poll() so it's basically just an extra function call per tick if no interfaces are doing polling. cheers luigi From owner-freebsd-arch@FreeBSD.ORG Fri Sep 30 21:17:20 2005 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C203216A41F; Fri, 30 Sep 2005 21:17:20 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1087143D48; Fri, 30 Sep 2005 21:17:19 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.3/8.13.3) with ESMTP id j8ULHHFs049824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 1 Oct 2005 01:17:17 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.3/8.13.1/Submit) id j8ULHG74049823; Sat, 1 Oct 2005 01:17:16 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sat, 1 Oct 2005 01:17:16 +0400 From: Gleb Smirnoff To: Andre Oppermann Message-ID: <20050930211716.GP45345@cell.sick.ru> References: <20050930124000.GA45345@cell.sick.ru> <20050930160302.GJ45345@cell.sick.ru> <20050930181322.GB1768@garage.freebsd.pl> <20050930182325.GO45345@cell.sick.ru> <433D8417.D4666378@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <433D8417.D4666378@freebsd.org> User-Agent: Mutt/1.5.6i Cc: arch@FreeBSD.org, Pawel Jakub Dawidek , net@FreeBSD.org Subject: Re: [REVIEW/TEST] polling(4) changes X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Sep 2005 21:17:20 -0000 On Fri, Sep 30, 2005 at 08:29:43PM +0200, Andre Oppermann wrote: A> > It adds a stub function call every tick. The function returns almost A> > immediately if no interfaces do polling. A> A> If it does a FOREACH(interface) then it should stay as a kernel option. It isn't. Just 'if (poll_handlers == 0) return;'. Anyway, deoptionalizing polling would be a separate commit and discussion. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE From owner-freebsd-arch@FreeBSD.ORG Sat Oct 1 09:35:51 2005 Return-Path: X-Original-To: arch@freebsd.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1B92216A41F for ; Sat, 1 Oct 2005 09:35:51 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE03543D45 for ; Sat, 1 Oct 2005 09:35:50 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.13.0/8.13.0) with ESMTP id j919Zo6R002360 for ; Sat, 1 Oct 2005 02:35:50 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.13.0/8.13.0/Submit) id j919ZoN1002359 for arch@freebsd.org; Sat, 1 Oct 2005 02:35:50 -0700 Date: Sat, 1 Oct 2005 02:35:50 -0700 From: Brooks Davis To: arch@freebsd.org Message-ID: <20051001093550.GA32354@odin.ac.hmc.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TB36FDmn/VVEgNH/" Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new X-Spam-Status: No, hits=0.0 required=8.0 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on odin.ac.hmc.edu Cc: Subject: error in trimdomain(3) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Oct 2005 09:35:51 -0000 --TB36FDmn/VVEgNH/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I discovered today that the trimdomain() implementation in libutil deviates slightly from the manpage. The manpage says: The function trimdomain() removes the current domain name from the pas= sed fullhost name by writing a NUL character over the first period of the ^^^^^^^^^^^^ passed name. The current domain name is determined by calling gethostname(3) and removing everything up to the first period. which clearly indicates that trimdomain() should return either the unmodified string or a host name with no domain. In reality it will remove the domain name even if the result is not a host name. This means that if the host b.com calls trimdomain with "a.b.com" as the input string, the result is "a.b". This causes rshd to fail when a.b.com attempts to connect to b.com because rshd uses realhostname_sa to check the rhosts file. To make matters worse, rlogind does not do this since it rolls it's own auth rather than using pam so "rsh b.com" (really "rlogin b.com") works fine while "rsh b.com " blows up making for all sorts of hair pulling fun (particularly since the b.com to a.b.com direction works just fine). Fixing trimdomain is in fact trivial (turn a for loop into an if statement), but I'm concerned that somewhere out there someone is relying on the prior behavior. Mostly trimdomain is used in logging functions so it doesn't seem like a big issue. Any objections to committing this change (along with a regression test)? -- Brooks http://perforce.freebsd.org/chv.cgi?CH=3D84604 Change 84604 by brooks@brooks_fellow on 2005/10/01 08:58:10 Implement the documented behavior of trim domain. The key change is that calling trimdomain with "a.b.com" on host "b.com" now leaves the input untouched instead of managling it to "a.b". =09 This fixes rsh connections from a.b.com to b.com. Because rsh uses an entierly different authentication implementation than rlogin, "rsh b.com" worked, but "rsh b.com command" did not. Affected files ... =2E. //depot/user/brooks/cleanup/lib/libutil/trimdomain.c#2 edit Differences ... =3D=3D=3D=3D //depot/user/brooks/cleanup/lib/libutil/trimdomain.c#2 (text+k= o) =3D=3D=3D=3D @@ -75,18 +75,16 @@ =20 s =3D fullhost; end =3D s + hostsize + 1; - for (; (s =3D memchr(s, '.', (size_t)(end - s))) !=3D NULL; s++) { + if ((s =3D memchr(s, '.', (size_t)(end - s))) !=3D NULL) { if (strncasecmp(s + 1, domain, dlen) =3D=3D 0) { if (s[dlen + 1] =3D=3D '\0') { /* Found -- lose the domain. */ *s =3D '\0'; - break; } else if (s[dlen + 1] =3D=3D ':' && isDISP(s + dlen + 2) && (len =3D strlen(s + dlen + 1)) < (size_t)(end - s)) { /* Found -- shuffle the DISPLAY back. */ memmove(s, s + dlen + 1, len + 1); - break; } } } --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --TB36FDmn/VVEgNH/ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFDPlh1XY6L6fI4GtQRAvsBAKCc8tqF4GspUsdg2H09d4aBMjMTQgCgzSFD 4aB2nkz7+b0kX1bDprV2oC4= =/YZr -----END PGP SIGNATURE----- --TB36FDmn/VVEgNH/--