From owner-freebsd-net@FreeBSD.ORG Sun Dec 23 04:04:42 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FDDA16A419; Sun, 23 Dec 2007 04:04:42 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 5BDA713C459; Sun, 23 Dec 2007 04:04:42 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id B0C5E1A4D82; Sat, 22 Dec 2007 20:03:08 -0800 (PST) Date: Sat, 22 Dec 2007 20:03:08 -0800 From: Alfred Perlstein To: David G Lawrence Message-ID: <20071223040308.GT16982@elvis.mu.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <20071222002432.GK16982@elvis.mu.org> <20071222073236.GW25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071222073236.GW25053@tnn.dglawrence.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Dec 2007 04:04:42 -0000 * David G Lawrence [071221 23:31] wrote: > > > > Can you use a placeholder vnode as a place to restart the scan? > > > > you might have to mark it special so that other threads/things > > > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > > > restart point. > > > > > > That was one of the solutions that I considered and rejected since it > > > would significantly increase the overhead of the loop. > > > The solution provided by Kostik Belousov that uses uio_yield looks like > > > a find solution. I intend to try it out on some servers RSN. > > > > Out of curiosity's sake, why would it make the loop slower? one > > would only add the placeholder when yielding, not for every iteration. > > Actually, I misread your suggestion and was thinking marker flag, > rather than placeholder vnode. Sorry about that. The current code > actually already uses a marker vnode. It is hidden and obfuscated in > the MNT_VNODE_FOREACH macro, further hidden in the __mnt_vnode_first/next > functions, so it should be safe from vnode reclaimation/free problems. That level of obscuring is a bit worrysome. Yes, I did mean placeholder vnode. Even so, is it of utility or not? Or is it already being used and I'm missing something and should just "utsl" at this point? -- - Alfred Perlstein From owner-freebsd-net@FreeBSD.ORG Sun Dec 23 20:29:44 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 228C616A418; Sun, 23 Dec 2007 20:29:44 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 102C013C469; Sun, 23 Dec 2007 20:29:44 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (remko@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBNKThuj095890; Sun, 23 Dec 2007 20:29:43 GMT (envelope-from remko@freefall.freebsd.org) Received: (from remko@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBNKThCv095886; Sun, 23 Dec 2007 20:29:43 GMT (envelope-from remko) Date: Sun, 23 Dec 2007 20:29:43 GMT Message-Id: <200712232029.lBNKThCv095886@freefall.freebsd.org> To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: remko@FreeBSD.org Cc: Subject: Re: kern/118975: [bge] [patch] Broadcom 5906 not handled by FreeBSD X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Dec 2007 20:29:44 -0000 Synopsis: [bge] [patch] Broadcom 5906 not handled by FreeBSD Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: remko Responsible-Changed-When: Sun Dec 23 20:29:32 UTC 2007 Responsible-Changed-Why: reassign to -net maintainers http://www.freebsd.org/cgi/query-pr.cgi?pr=118975 From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 11:07:02 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B667216A418 for ; Mon, 24 Dec 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8E27513C4E7 for ; Mon, 24 Dec 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBOB728Y032008 for ; Mon, 24 Dec 2007 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBOB71pX032004 for freebsd-net@FreeBSD.org; Mon, 24 Dec 2007 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 24 Dec 2007 11:07:01 GMT Message-Id: <200712241107.lBOB71pX032004@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 11:07:02 -0000 Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- f kern/115360 net [ipv6] IPv6 address and if_bridge don't play well toge 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- a kern/38554 net changing interface ipaddress doesn't seem to work s kern/39937 net ipstealth issue f kern/62374 net panic: free: multiple frees s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/92552 net A serious bug in most network drivers from 5.X to 6.X s kern/95665 net [if_tun] "ping: sendto: No buffer space available" wit s kern/105943 net Network stack may modify read-only mbuf chain copies o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/108542 net [bce]: Huge network latencies with 6.2-RELEASE / STABL o kern/110959 net [ipsec] Filtering incoming packets with enc0 does not o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 net IP v4 udp fragmented packet reject o kern/113457 net [ipv6] deadlock occurs if a tunnel goes down while the o kern/113842 net [ipv6] PF_INET6 proto domain state can't be cleared wi o kern/114714 net [gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net 6.2-STABLE panic during use of multi-cast networking c o kern/116172 net Network / ipv6 recursive mutex panic o kern/116185 net if_iwi driver leads system to reboot o kern/116328 net [bge]: Solid hang with bge interface o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 net ifconfig tunX destroy: panic o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117423 net Duplicate IP on different interfaces o bin/117448 net [carp] 6.2 kernel crash o kern/117717 net [panic] Kernel panic with Bittorrent client. o kern/118880 net [ipv6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented 29 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o conf/23063 net [PATCH] for static ARP tables in rc.network s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net [nfs] [patch] NFS root configurations without dynamic s kern/60293 net FreeBSD arp poison patch o kern/95267 net packet drops periodically appear f kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/102035 net [plip] plip networking disables parallel port printing o conf/102502 net [patch] ifconfig name does't rename netgraph node in n o conf/107035 net [patch] bridge interface given in rc.conf not taking a o kern/112654 net [pcn] Kernel panic upon if_pcn module load on a Netfin o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net [patch] fstat(1): add INET/INET6 socket details as in o bin/117339 net [patch] route(8): loading routing management commands o kern/118722 net [tcp] Many old TCP connections in SYN_RCVD state o kern/118727 net [ng] [patch] add new ng_pf module o kern/118879 net [bge] [patch] bge has checksum problems on the 5703 ch o kern/118975 net [bge] [patch] Broadcom 5906 not handled by FreeBSD 18 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 11:43:48 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F31D616A418 for ; Mon, 24 Dec 2007 11:43:47 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 776C013C457 for ; Mon, 24 Dec 2007 11:43:47 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2159047BC9; Mon, 24 Dec 2007 06:43:47 -0500 (EST) Date: Mon, 24 Dec 2007 11:43:46 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Vlad GALU In-Reply-To: Message-ID: <20071224113901.M40176@fledge.watson.org> References: <4755EFDD.8070609@isc.org> <20071205021851.V87930@fledge.watson.org> <20071205093244.U87930@fledge.watson.org> <20071205094657.P87930@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Peter Losher Subject: Zero-copy BPF update (was: Re: Aggregating many ports into one for tcpdump server.) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 11:43:48 -0000 On Wed, 5 Dec 2007, Vlad GALU wrote: >>> I've had several reports of significantly improved packet capture rates at >>> high speeds with it, but it's not yet in the tree because we feel it needs >>> more evaluation and review. I hope to ship some form of zero-copy BPF >>> buffer support in FreeBSD 8, and possibly even MFC it. Any feedback you >>> might have would be most helpful. >> >> Having sent you the patch, I should have let you know that you'll need to: >> >> - Add options BPF_ZEROCOPY to your kernel configuration to enable the >> zero-copy buffering mode. >> >> - Make sure the kernel and libpcap are rebuild following the application of >> the patch and dropping in the tarball. >> >> - setenv BPF_ZERO_COPY before running tcpdump or other BPF-based tools to >> enable the zero-copy buffer mode. >> >> The patch includes both kernel changes (abstract the buffer model, add a >> new buffer model) and user space changes (updated libpcap to speak the new >> model, selected right now with the environmental variable). Presumably if >> merged, zero-copy BPF buffers would be used by default via libpcap if >> present in the kernel, but right now this is all for evaluation purposes. > > Thanks, Robert! I'll start running a few tests next week, I'm waiting for > some hardware to arrive first. I've put up an updated tarball based on some recent changes here: http://www.watson.org/~robert/freebsd/20071226-zcopybpf.tgz The main changes since this last drop are: - BPF_ZERO_COPY environmental variable renamed to BPF_ZEROCOPY to match kernel option name. - libpcap support for zero-copy BPF buffers reworked to avoid unconditional call to select() for each buffer when there's already a pending buffer available to use; in general, avoid system calls entirely when there's data already waiting, only use system calls when there isn't a completed buffer to work on next. - Comments cleanup and some code cleanup. - A README to provide a little more guidance on getting it working. :-) You will need to "make clean ; make ; make install" in the modified libpcap against, as the size of pcap_t has changed. In principle "make ; make install" should DTRT, but it appears not to for me. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 11:54:10 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3460616A419; Mon, 24 Dec 2007 11:54:10 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id EF05B13C4DB; Mon, 24 Dec 2007 11:54:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9BE6447911; Mon, 24 Dec 2007 06:54:09 -0500 (EST) Date: Mon, 24 Dec 2007 11:54:09 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: dima <_pppp@mail.ru> In-Reply-To: Message-ID: <20071224114504.E40176@fledge.watson.org> References: <20071220135342.O67327@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: Re: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 11:54:10 -0000 On Thu, 20 Dec 2007, dima wrote: >> Per earlier e-mail, I've created a page to track the various on-going >> projects: >> >> http://wiki.freebsd.org/TCPProjects8 >> >> Rui has already kindly added the TCP ECN work to the page. > > As I know, we have a single swi:net thread in the kernel yet. Are there any > plans to make several such threads? If yes, this activity isn't mentioned in > wiki. > > There are 2 ideas: 1. per-core thread 2. per-interface thread I like the > second more. This is a kind of tricky point, and one we will definitely be looking at. In FreeBSD 6, we did link layer processing in the ithread, and deferred network layer and socket layer processing to the netisr and user thread. In FreeBSD 7, we process up through the network layer and socket deliver in the ithread, and only the socket read/copyout are deferred to the user thread. This means that in FreeBSD 7, we get true parallelism between different input sources. We still have the netisr, which is used for certain types of deferred processing, such as loopback network traffic (in order to avoid entering the receive path from the transmit path), IPSEC tunnel processing, etc, but for general ethernet traffic, it is not used. This appears to work really well for a small number of interfaces because we eliminate a large number of context switches, and pushed the "drop point" from software into hardware, meaning that we don't burn cycles doing link layer processing for packets that will never make it to the network layer (netisr queue overflow). The two real downsides are that this promotes network layer processing to interrupt priority rather than soft interrupt priority (and this may propagate to more other threads), and that the opportunity for parallelism is reduced between the link layer and the network processing layer. The reason we went ahead and made the default change (it's configurable at runtime) is that it seemed that in most cases, we saw a significant performance improvement. However, the current ithread/direct dispatch model has scaling issues as we approach larger numbers of interfaces, as the ithread approach does generally, because when the number of active thread exceeds the number of cores and the system is really busy, context switches are re-introduced, as well as an increased chance of ithreads bouncing around, etc. What to do at that point is an interesting question--would we be better off reducing the number of active threads so that we have a small ithread worker pool serving many devices, for example? So, in answer to your original question: we already do a per-interface thread for all in-bound processing in FreeBSD 7, but we'll need to continue to work on the underlying model and its behavior under high load. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 13:19:22 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87B0616A419; Mon, 24 Dec 2007 13:19:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 073D013C458; Mon, 24 Dec 2007 13:19:21 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J6nD9-000K30-En; Mon, 24 Dec 2007 15:19:20 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBODJ71x098885; Mon, 24 Dec 2007 15:19:07 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBODJ68N098878; Mon, 24 Dec 2007 15:19:06 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 24 Dec 2007 15:19:06 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20071224131906.GB57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="nuX4E7Vid9I2gnPU" Content-Disposition: inline In-Reply-To: <20071223095314.G1323@delplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 33ca3efb02d57e4381ef3dfbedf9ea6a X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1955 [Dec 24 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: Mark Fullmer , freebsd-stable@freebsd.org, "Freebsd-Net@Freebsd. Org" Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 13:19:22 -0000 --nuX4E7Vid9I2gnPU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Dec 23, 2007 at 10:20:31AM +1100, Bruce Evans wrote: > On Sat, 22 Dec 2007, Kostik Belousov wrote: > >Ok, since you talked about this first :). I already made the following > >patch, but did not published it since I still did not inspected all > >callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock. > >It shall be safe, but better to check. Also, I postponed the check > >until it was reported that yielding does solve the original problem. >=20 > Good. I'd still like to unobfuscate the function call. What do you mean there ?=20 > Putting the count in the union seems fragile at best. Even if nothing > can access the marker vnode, you need to context-switch its old contents > while using it for the count, in case its old contents is used. Vnode- > printing routines might still be confused. Could you, please, describe what you mean by "contex-switch" for the VMARKER ? Mark, could you, please, retest the patch below in your setup ? I want to put a change or some edition of it into the 7.0 release, and we need to move fast to do this. diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index 14acc5b..046af82 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -1994,6 +1994,12 @@ __mnt_vnode_next(struct vnode **mvp, struct mount *m= p) mtx_assert(MNT_MTX(mp), MA_OWNED); =20 KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); + if ((*mvp)->v_yield++ =3D=3D 500) { + MNT_IUNLOCK(mp); + (*mvp)->v_yield =3D 0; + uio_yield(); + MNT_ILOCK(mp); + } vp =3D TAILQ_NEXT(*mvp, v_nmntvnodes); while (vp !=3D NULL && vp->v_type =3D=3D VMARKER) vp =3D TAILQ_NEXT(vp, v_nmntvnodes); diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index dc70417..6e3119b 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -131,6 +131,7 @@ struct vnode { struct socket *vu_socket; /* v unix domain net (VSOCK) */ struct cdev *vu_cdev; /* v device (VCHR, VBLK) */ struct fifoinfo *vu_fifoinfo; /* v fifo (VFIFO) */ + int vu_yield; /* yield count (VMARKER) */ } v_un; =20 /* @@ -185,6 +186,7 @@ struct vnode { #define v_socket v_un.vu_socket #define v_rdev v_un.vu_cdev #define v_fifoinfo v_un.vu_fifoinfo +#define v_yield v_un.vu_yield =20 /* XXX: These are temporary to avoid a source sweep at this time */ #define v_object v_bufobj.bo_object --nuX4E7Vid9I2gnPU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHb7HJC3+MBN1Mb4gRAjM7AJ9DxpeIkxYG5g3BxgVpfoRsYGgVYgCgwfu+ pWV14zgIp7rRQRHlldbdOw4= =Ej13 -----END PGP SIGNATURE----- --nuX4E7Vid9I2gnPU-- From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 22:24:21 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A77EC16A421; Mon, 24 Dec 2007 22:24:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 3E36713C43E; Mon, 24 Dec 2007 22:24:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBOMOChC015490 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 25 Dec 2007 09:24:18 +1100 Date: Tue, 25 Dec 2007 09:24:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kostik Belousov In-Reply-To: <20071224131906.GB57756@deviant.kiev.zoral.com.ua> Message-ID: <20071225091009.L3200@besplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> <20071224131906.GB57756@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Mark Fullmer , freebsd-stable@freebsd.org, "Freebsd-Net@Freebsd. Org" Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 22:24:21 -0000 On Mon, 24 Dec 2007, Kostik Belousov wrote: > On Sun, Dec 23, 2007 at 10:20:31AM +1100, Bruce Evans wrote: >> On Sat, 22 Dec 2007, Kostik Belousov wrote: >>> Ok, since you talked about this first :). I already made the following >>> patch, but did not published it since I still did not inspected all >>> callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock. >>> It shall be safe, but better to check. Also, I postponed the check >>> until it was reported that yielding does solve the original problem. >> >> Good. I'd still like to unobfuscate the function call. > What do you mean there ? Make the loop control and overheads clear by making the function call explicit, maybe by expanding MNT_VNODE_FOREACH() inline after fixing the style bugs in it. Later, fix the code to match the comment again by not making a function call in the usual case. This is harder. >> Putting the count in the union seems fragile at best. Even if nothing >> can access the marker vnode, you need to context-switch its old contents >> while using it for the count, in case its old contents is used. Vnode- >> printing routines might still be confused. > Could you, please, describe what you mean by "contex-switch" for the > VMARKER ? Oh, I didn't notice that the marker vnode is out of band (a whole new vnode is malloced for each marker). The context switching would be needed if an ordinary active vnode that uses the union is used as a marker. Bruce From owner-freebsd-net@FreeBSD.ORG Mon Dec 24 22:56:28 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F47516A418; Mon, 24 Dec 2007 22:56:28 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 25E4113C448; Mon, 24 Dec 2007 22:56:28 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (remko@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBOMuSWQ043014; Mon, 24 Dec 2007 22:56:28 GMT (envelope-from remko@freefall.freebsd.org) Received: (from remko@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBOMuSNF043010; Mon, 24 Dec 2007 22:56:28 GMT (envelope-from remko) Date: Mon, 24 Dec 2007 22:56:28 GMT Message-Id: <200712242256.lBOMuSNF043010@freefall.freebsd.org> To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: remko@FreeBSD.org Cc: Subject: Re: bin/118987: ifconfig -l [address_family] does not work correct on RELENG-7 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Dec 2007 22:56:28 -0000 Synopsis: ifconfig -l [address_family] does not work correct on RELENG-7 Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: remko Responsible-Changed-When: Mon Dec 24 22:56:27 UTC 2007 Responsible-Changed-Why: Over to maintainer. http://www.freebsd.org/cgi/query-pr.cgi?pr=118987 From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 00:40:12 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6584816A419 for ; Tue, 25 Dec 2007 00:40:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3BF6813C467 for ; Tue, 25 Dec 2007 00:40:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A410746B7E; Mon, 24 Dec 2007 19:40:11 -0500 (EST) Date: Tue, 25 Dec 2007 00:40:11 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Vlad GALU In-Reply-To: <20071224113901.M40176@fledge.watson.org> Message-ID: <20071225003638.N55818@fledge.watson.org> References: <4755EFDD.8070609@isc.org> <20071205021851.V87930@fledge.watson.org> <20071205093244.U87930@fledge.watson.org> <20071205094657.P87930@fledge.watson.org> <20071224113901.M40176@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Peter Losher Subject: Re: Zero-copy BPF update (was: Re: Aggregating many ports into one for tcpdump server.) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 00:40:12 -0000 On Mon, 24 Dec 2007, Robert Watson wrote: > I've put up an updated tarball based on some recent changes here: > > http://www.watson.org/~robert/freebsd/20071226-zcopybpf.tgz Unfortunately, there was a problem with a change I made to the kernel check for a userspace notification that a buffer was available via shared memory, so I have updated the tarball. While there, I've made the following further changes: - bpf.4 has been updated to reflect the shared memory interface. - I've eliminated the BPF_ZEROCOPY environmental variable in libpcap; libpcap will try to use zero-copy BPF if it is compiled into the kernel and enabled. - The kernel now has a net.bpf.zerocopy_enable sysctl, set to 1 by default, which controls whether the kernel will allow new zero-copy BPF sessions to be created. - Memory barriers and atomic operations have been introduced on the shared memory interface to improve correctness on platforms with weaker memory consistency. Robert N M Watson Computer Laboratory University of Cambridge > > The main changes since this last drop are: > > - BPF_ZERO_COPY environmental variable renamed to BPF_ZEROCOPY to match > kernel > option name. > > - libpcap support for zero-copy BPF buffers reworked to avoid unconditional > call to select() for each buffer when there's already a pending buffer > available to use; in general, avoid system calls entirely when there's data > already waiting, only use system calls when there isn't a completed buffer > to work on next. > > - Comments cleanup and some code cleanup. > > - A README to provide a little more guidance on getting it working. :-) > > You will need to "make clean ; make ; make install" in the modified libpcap > against, as the size of pcap_t has changed. In principle "make ; make > install" should DTRT, but it appears not to for me. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 01:17:07 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E2E316A41A for ; Tue, 25 Dec 2007 01:17:07 +0000 (UTC) (envelope-from maf@splintered.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id CEC0213C448 for ; Tue, 25 Dec 2007 01:17:06 +0000 (UTC) (envelope-from maf@splintered.net) Received: (qmail 19758 invoked from network); 25 Dec 2007 01:17:05 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 25 Dec 2007 01:17:05 -0000 In-Reply-To: <20071224131906.GB57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> <20071224131906.GB57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <833223E8-B1ED-4358-A992-F3789E4B3070@splintered.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Mon, 24 Dec 2007 20:16:50 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 01:17:07 -0000 On Dec 24, 2007, at 8:19 AM, Kostik Belousov wrote: > > Mark, could you, please, retest the patch below in your setup ? > I want to put a change or some edition of it into the 7.0 release, and > we need to move fast to do this. It's building now. The testing will run overnight. Your patch to ffs_sync() and vfs_msync() stopped the periodic packet loss, but other file system activity such as (cd /; tar -cf - .) > /dev/ null will cause dropped packets. Same behavior, packets never make it up to the IP layer. -- mark From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 05:27:56 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D6F316A419; Tue, 25 Dec 2007 05:27:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 50E3A13C4D9; Tue, 25 Dec 2007 05:27:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J72KW-0003Rc-O5; Tue, 25 Dec 2007 07:27:55 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBP5RnJ7004058; Tue, 25 Dec 2007 07:27:49 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBP5Rnba004057; Tue, 25 Dec 2007 07:27:49 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 25 Dec 2007 07:27:49 +0200 From: Kostik Belousov To: Mark Fullmer Message-ID: <20071225052749.GE57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> <20071224131906.GB57756@deviant.kiev.zoral.com.ua> <833223E8-B1ED-4358-A992-F3789E4B3070@splintered.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wwSkEpePV3aFlXly" Content-Disposition: inline In-Reply-To: <833223E8-B1ED-4358-A992-F3789E4B3070@splintered.net> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: bb9f355b5be917825c280d06cf0e20ab X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1960 [Dec 24 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 05:27:56 -0000 --wwSkEpePV3aFlXly Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 24, 2007 at 08:16:50PM -0500, Mark Fullmer wrote: >=20 > On Dec 24, 2007, at 8:19 AM, Kostik Belousov wrote: >=20 > > > >Mark, could you, please, retest the patch below in your setup ? > >I want to put a change or some edition of it into the 7.0 release, and > >we need to move fast to do this. >=20 > It's building now. The testing will run overnight. >=20 > Your patch to ffs_sync() and vfs_msync() stopped the periodic packet =20 > loss, > but other file system activity such as (cd /; tar -cf - .) > /dev/=20 > null will > cause dropped packets. Same behavior, packets never make it up to the > IP layer. What fs do you use ? If FFS, are softupdates turned on ? Please, show the total time spent in the softdepflush process. Also, try to add the FULL_PREEMPTION kernel config option and report whether it helps. --wwSkEpePV3aFlXly Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHcJTUC3+MBN1Mb4gRAp8rAJ48R0XeTKFoQjBv9GihgvneGikMgQCeMACM VWaDAdA/fUzq+OtvQA7tdY0= =pqDG -----END PGP SIGNATURE----- --wwSkEpePV3aFlXly-- From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 10:36:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA6A816A417; Tue, 25 Dec 2007 10:36:38 +0000 (UTC) (envelope-from w@wrzask.pl) Received: from mx.oak.pl (mx.oak.pl [217.96.108.251]) by mx1.freebsd.org (Postfix) with ESMTP id B4A0313C442; Tue, 25 Dec 2007 10:36:38 +0000 (UTC) (envelope-from w@wrzask.pl) Received: by oak.pl (Postfix, from userid 1002) id D52D01CD4B; Tue, 25 Dec 2007 11:36:36 +0100 (CET) Date: Tue, 25 Dec 2007 11:36:36 +0100 From: Jan Srzednicki To: Maxim Konovalov Message-ID: <20071225103636.GA89362@oak.pl> References: <20071127135320.GJ2045@oak.pl> <474DB1D0.3010100@elischer.org> <20071128183001.GQ2045@oak.pl> <474DB6B3.1020202@elischer.org> <20071130093628.GS2045@oak.pl> <20071225133012.M40739@mp2.macomnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071225133012.M40739@mp2.macomnet.net> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-net@freebsd.org, Adrian Chadd , freebsd-stable@freebsd.org, Julian Elischer Subject: Re: connect() returns EADDRINUSE during massive host->host conn rate X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 10:36:39 -0000 On Tue, Dec 25, 2007 at 01:30:36PM +0300, Maxim Konovalov wrote: > On Fri, 30 Nov 2007, 19:26+0900, Adrian Chadd wrote: > > > Finding out more about the socket thats been created and what its > > clashing with might help. I'd do it myself but I'm not sure how to > > duplicate the issue. > > > Have you tried to turn net.inet.ip.portrange.randomized off? Yes, it didn't change anything. -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 10:55:50 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28B6716A41A; Tue, 25 Dec 2007 10:55:50 +0000 (UTC) (envelope-from maxim@macomnet.ru) Received: from mp2.macomnet.net (mp2.macomnet.net [195.128.64.6]) by mx1.freebsd.org (Postfix) with ESMTP id B744D13C442; Tue, 25 Dec 2007 10:55:49 +0000 (UTC) (envelope-from maxim@macomnet.ru) Received: from localhost (localhost.int.ru [127.0.0.1] (may be forged)) by mp2.macomnet.net (8.13.7/8.13.8) with ESMTP id lBPAUa5N010042; Tue, 25 Dec 2007 13:30:37 +0300 (MSK) (envelope-from maxim@macomnet.ru) Date: Tue, 25 Dec 2007 13:30:36 +0300 (MSK) From: Maxim Konovalov To: Adrian Chadd In-Reply-To: Message-ID: <20071225133012.M40739@mp2.macomnet.net> References: <20071127135320.GJ2045@oak.pl> <474DB1D0.3010100@elischer.org> <20071128183001.GQ2045@oak.pl> <474DB6B3.1020202@elischer.org> <20071130093628.GS2045@oak.pl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-net@freebsd.org, Jan Srzednicki , Julian Elischer , freebsd-stable@freebsd.org Subject: Re: connect() returns EADDRINUSE during massive host->host conn rate X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 10:55:50 -0000 On Fri, 30 Nov 2007, 19:26+0900, Adrian Chadd wrote: > On 30/11/2007, Jan Srzednicki wrote: > > > Most of the relevant sockets (that is, between the two host > > mentioned) are in the ESTABLISHED state (200-400 of those). Only > > 20-40 are in TIME_WAIT state (these tend to be from a more > > ephemeric POP3 service). Most of the EADDRINUSE happen for the > > IMAP4 service. > > I'd probably start by patching the places in the tcp code > (src/sys/netinet/tcp_usrreq.c) which returns this error > (Its returned in other places but that seems to me to be the most > likely from your description.) > > Insert some code to print out information about the current socket and > the "oinp" value returned from in_pcbconnect_setup() (if this is the > place where the error occured.) > > Finding out more about the socket thats been created and what its > clashing with might help. I'd do it myself but I'm not sure how to > duplicate the issue. > Have you tried to turn net.inet.ip.portrange.randomized off? -- Maxim Konovalov From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 12:44:15 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7485316A418 for ; Tue, 25 Dec 2007 12:44:15 +0000 (UTC) (envelope-from jordi.espasa@opengea.org) Received: from mail.opengea.org (234.pool85-48-253.static.orange.es [85.48.253.234]) by mx1.freebsd.org (Postfix) with ESMTP id 31E1813C442 for ; Tue, 25 Dec 2007 12:44:15 +0000 (UTC) (envelope-from jordi.espasa@opengea.org) Received: from localhost (tartarus [127.0.0.1]) by mail.opengea.org (Opengea.org Project MailServer) with ESMTP id CABABD5003E for ; Tue, 25 Dec 2007 13:21:27 +0100 (CET) X-Virus-Scanned: amavisd-new at opengea.org Received: from mail.opengea.org ([127.0.0.1]) by localhost (mail.opengea.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id rP5MlD-TEmqV for ; Tue, 25 Dec 2007 13:21:27 +0100 (CET) Received: from [192.168.1.33] (191.Red-88-25-68.staticIP.rima-tde.net [88.25.68.191]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jordi.espasa@opengea.org) by mail.opengea.org (Opengea.org Project MailServer) with ESMTP id 5F351D50038 for ; Tue, 25 Dec 2007 13:21:27 +0100 (CET) Message-ID: <4770F5BF.40100@opengea.org> Date: Tue, 25 Dec 2007 13:21:19 +0100 From: Jordi Espasa Clofent User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 12:44:15 -0000 Hi all, I know how to monitoring the NIC IRQ's consume, with tools as vmstat (-i flag), systat (-vm 1) or netstat (-m, -i), but I don't know how to determine the maximum interrupts that these NICs can give. I've several SuperMicro servers with Intel Pro 1000 PT NICs, which are controlled by em(4) driver. I've done some performance tests (with tools as iperf or netperf) with great results, but I don't know exactly the hardware limits, because I don't know the maximum IRQ rate. Obviously, before post this present message I've read a lot of documentation provided by vendor (Intel in this case) but I've not found it. -- Thanks, Jordi Espasa Clofent From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 14:12:30 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED22C16A419; Tue, 25 Dec 2007 14:12:30 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D9CD913C44B; Tue, 25 Dec 2007 14:12:30 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from freefall.freebsd.org (kris@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBPECU5T002615; Tue, 25 Dec 2007 14:12:30 GMT (envelope-from kris@freefall.freebsd.org) Received: (from kris@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBPECU99002600; Tue, 25 Dec 2007 14:12:30 GMT (envelope-from kris) Date: Tue, 25 Dec 2007 14:12:30 GMT Message-Id: <200712251412.lBPECU99002600@freefall.freebsd.org> To: aseelye@eltopia.com, kris@FreeBSD.org, freebsd-net@FreeBSD.org From: kris@FreeBSD.org Cc: Subject: Re: kern/118879: [bge] [patch] bge has checksum problems on the 5703 chipset X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 14:12:31 -0000 Synopsis: [bge] [patch] bge has checksum problems on the 5703 chipset State-Changed-From-To: open->analyzed State-Changed-By: kris State-Changed-When: Tue Dec 25 14:11:09 UTC 2007 State-Changed-Why: This is most likely not a bug unless you can confirm the bad checksums from *ANOTHER* machine on the same link. When hardware checksum offload is in use, the OS does not compute the checksum (it happens in the NIC as the packet is transmitted) so tcpdump sees a "wrong" checksum. See the tcpdump manpage. http://www.freebsd.org/cgi/query-pr.cgi?pr=118879 From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 14:15:59 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 98E6B16A417; Tue, 25 Dec 2007 14:15:59 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8628813C458; Tue, 25 Dec 2007 14:15:59 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from freefall.freebsd.org (kris@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBPEFxqd006013; Tue, 25 Dec 2007 14:15:59 GMT (envelope-from kris@freefall.freebsd.org) Received: (from kris@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBPEFx90006009; Tue, 25 Dec 2007 14:15:59 GMT (envelope-from kris) Date: Tue, 25 Dec 2007 14:15:59 GMT Message-Id: <200712251415.lBPEFx90006009@freefall.freebsd.org> To: kris@FreeBSD.org, freebsd-net@FreeBSD.org, rwatson@FreeBSD.org From: kris@FreeBSD.org Cc: Subject: Re: kern/117717: [panic] Kernel panic with Bittorrent client. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 14:15:59 -0000 Synopsis: [panic] Kernel panic with Bittorrent client. Responsible-Changed-From-To: freebsd-net->rwatson Responsible-Changed-By: kris Responsible-Changed-When: Tue Dec 25 14:15:45 UTC 2007 Responsible-Changed-Why: Assign to rwatson at his request http://www.freebsd.org/cgi/query-pr.cgi?pr=117717 From owner-freebsd-net@FreeBSD.ORG Tue Dec 25 23:53:39 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B968E16A418; Tue, 25 Dec 2007 23:53:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 79B1713C459; Tue, 25 Dec 2007 23:53:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 061E646E89; Tue, 25 Dec 2007 18:53:39 -0500 (EST) Date: Tue, 25 Dec 2007 23:53:38 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: kris@FreeBSD.org, Jonathan Chen In-Reply-To: <200712251415.lBPEFx90006009@freefall.freebsd.org> Message-ID: <20071225234740.O91077@fledge.watson.org> References: <200712251415.lBPEFx90006009@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org Subject: Re: kern/117717: [panic] Kernel panic with Bittorrent client. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2007 23:53:39 -0000 On Tue, 25 Dec 2007, kris@FreeBSD.org wrote: > Synopsis: [panic] Kernel panic with Bittorrent client. > > Responsible-Changed-From-To: freebsd-net->rwatson > Responsible-Changed-By: kris > Responsible-Changed-When: Tue Dec 25 14:15:45 UTC 2007 > Responsible-Changed-Why: > Assign to rwatson at his request > > http://www.freebsd.org/cgi/query-pr.cgi?pr=117717 I'm happy to take a look at this PR, and have installed a 6.3 box with X11 and Deluge, from the RC2 ISOs, for this purpose. However, I've never used Deluge, so sample command lines and configuration files I could use would be very helpfull: robert@cinnamon-freebsd6:~> deluge no existing Deluge session Traceback (most recent call last): File "/usr/local/bin/deluge", line 119, in deluge.wizard.WizardGTK() File "/usr/local/lib/python2.5/site-packages/deluge/wizard.py", line 53, in __init__ pixmap = deluge.common.get_logo(48) File "/usr/local/lib/python2.5/site-packages/deluge/common.py", line 156, in get_logo size, size) gobject.GError: Unrecognized image file format Thanks, Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 03:44:48 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D74516A417; Wed, 26 Dec 2007 03:44:48 +0000 (UTC) (envelope-from jonc@chen.org.nz) Received: from chen.org.nz (chen.org.nz [202.89.146.5]) by mx1.freebsd.org (Postfix) with ESMTP id AACEC13C43E; Wed, 26 Dec 2007 03:44:47 +0000 (UTC) (envelope-from jonc@chen.org.nz) Received: by chen.org.nz (Postfix, from userid 1001) id 9A4ED3F4C; Wed, 26 Dec 2007 16:28:08 +1300 (NZDT) Date: Wed, 26 Dec 2007 16:28:08 +1300 From: Jonathan Chen To: Robert Watson Message-ID: <20071226032808.GA21780@osiris.chen.org.nz> References: <200712251415.lBPEFx90006009@freefall.freebsd.org> <20071225234740.O91077@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071225234740.O91077@fledge.watson.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@FreeBSD.org Subject: Re: kern/117717: [panic] Kernel panic with Bittorrent client. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 03:44:48 -0000 Hi Robert, On Tue, Dec 25, 2007 at 11:53:38PM +0000, Robert Watson wrote: > > On Tue, 25 Dec 2007, kris@FreeBSD.org wrote: > > >Synopsis: [panic] Kernel panic with Bittorrent client. > > > >Responsible-Changed-From-To: freebsd-net->rwatson > >Responsible-Changed-By: kris > >Responsible-Changed-When: Tue Dec 25 14:15:45 UTC 2007 > >Responsible-Changed-Why: > >Assign to rwatson at his request > > > >http://www.freebsd.org/cgi/query-pr.cgi?pr=117717 > > I'm happy to take a look at this PR, and have installed a 6.3 box with X11 > and Deluge, from the RC2 ISOs, for this purpose. However, I've never used > Deluge, so sample command lines and configuration files I could use would > be very helpfull: > > robert@cinnamon-freebsd6:~> deluge > no existing Deluge session > Traceback (most recent call last): > File "/usr/local/bin/deluge", line 119, in > deluge.wizard.WizardGTK() > File "/usr/local/lib/python2.5/site-packages/deluge/wizard.py", line 53, > in __init__ > pixmap = deluge.common.get_logo(48) > File "/usr/local/lib/python2.5/site-packages/deluge/common.py", line 156, > in get_logo > size, size) > gobject.GError: Unrecognized image file format Thanks for taking a look at this. I've never seen this particular error before; my installation of deluge didn't require any specific configuration, it just started up, presenting me with a dialog-wizard. Is it possible that your deluge installation is corrupt? I would give you a package if I could, but I've moved onto 7-STABLE so that I could use the client. Cheers. -- Jonathan Chen From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 09:08:28 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2012E16A418 for ; Wed, 26 Dec 2007 09:08:28 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.152]) by mx1.freebsd.org (Postfix) with ESMTP id 9317D13C4E7 for ; Wed, 26 Dec 2007 09:08:27 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so1715895fgg.35 for ; Wed, 26 Dec 2007 01:08:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=5mppPxqrLpLuy7JcAK7XwsYEQGGe97pDGribzkMqPmo=; b=Ue7plIXJ+X0BEmBU/bfZv/fgtmW4RWpCHJNEFidF0e73FLdL2WVNVQGXhMnpXO/TBxoLsGkQ81Wl5bIBFpOoLxuEyHvnlHi6djsyFu569V9E6xSuviXve5+G2Q/H0A/snqKlPruI3qx9otrqCXDtaac1zQXqUkDJg8HfqT9dDJI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=qFdzX3cZbGs5Z9ZL2RIG65mC9C6lhhK+OngxrhpcqTP0Q7KYxtpF5Udl6BmnW2tq/yo8ujicRfX5AFPojXAUVLgSRMT0nuIoMoGvWS7nVffmaI9GbH4ezwEzqwhbXQOjmE40VqgagRsyMV3PP9U4hUXwnU7QoNJoVznNaophc6c= Received: by 10.86.1.1 with SMTP id 1mr6467284fga.2.1198658400746; Wed, 26 Dec 2007 00:40:00 -0800 (PST) Received: by 10.86.97.10 with HTTP; Wed, 26 Dec 2007 00:40:00 -0800 (PST) Message-ID: <2a41acea0712260040h7ef404eby661d7eea68706209@mail.gmail.com> Date: Wed, 26 Dec 2007 00:40:00 -0800 From: "Jack Vogel" To: "Jordi Espasa Clofent" In-Reply-To: <4770F5BF.40100@opengea.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4770F5BF.40100@opengea.org> Cc: freebsd-net@freebsd.org Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 09:08:28 -0000 On Dec 25, 2007 4:21 AM, Jordi Espasa Clofent wrote: > Hi all, > > I know how to monitoring the NIC IRQ's consume, with tools as vmstat (-i > flag), systat (-vm 1) or netstat (-m, -i), but I don't know how to > determine the maximum interrupts that these NICs can give. > > I've several SuperMicro servers with Intel Pro 1000 PT NICs, which are > controlled by em(4) driver. I've done some performance tests (with tools > as iperf or netperf) with great results, but I don't know exactly the > hardware limits, because I don't know the maximum IRQ rate. > > Obviously, before post this present message I've read a lot of > documentation provided by vendor (Intel in this case) but I've not found it. I'm not sure of the purpose of the question, and I don't know that there is a specific number documented anywhere, the interrupt rate will depend on the rate at which they are serviced. For instance, with our new 10G card you must set the interrupt storm threshold up to 8000 just so the kernel doesnt throttle you down :) With a 1G interface doing line rate I don't think its necessary that it get that high, so again, why the question? Happy Holidays, Jack From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 10:12:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0591116A418 for ; Wed, 26 Dec 2007 10:12:38 +0000 (UTC) (envelope-from jordi.espasa@opengea.org) Received: from mail.opengea.org (234.pool85-48-253.static.orange.es [85.48.253.234]) by mx1.freebsd.org (Postfix) with ESMTP id AFDCA13C46E for ; Wed, 26 Dec 2007 10:12:36 +0000 (UTC) (envelope-from jordi.espasa@opengea.org) Received: from localhost (tartarus [127.0.0.1]) by mail.opengea.org (Opengea.org Project MailServer) with ESMTP id 2773FD5003E for ; Wed, 26 Dec 2007 11:13:09 +0100 (CET) X-Virus-Scanned: amavisd-new at opengea.org Received: from mail.opengea.org ([127.0.0.1]) by localhost (mail.opengea.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id a9HQ7SIo+j1L for ; Wed, 26 Dec 2007 11:13:09 +0100 (CET) Received: from [192.168.1.33] (191.Red-88-25-68.staticIP.rima-tde.net [88.25.68.191]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jordi.espasa@opengea.org) by mail.opengea.org (Opengea.org Project MailServer) with ESMTP id 5E5B5D50038 for ; Wed, 26 Dec 2007 11:13:08 +0100 (CET) Message-ID: <47722927.5000106@opengea.org> Date: Wed, 26 Dec 2007 11:12:55 +0100 From: Jordi Espasa Clofent User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: freebsd-net@freebsd.org References: <4770F5BF.40100@opengea.org> <2a41acea0712260040h7ef404eby661d7eea68706209@mail.gmail.com> In-Reply-To: <2a41acea0712260040h7ef404eby661d7eea68706209@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 10:12:38 -0000 OK, I'll try to explain in another way. While I've done network performance test I've monitored the IRQ rate, and, for example, it's a 7000/8000 interrupts per second in every NIC (I use 2 NICs in a bridge). The question is ¿how can I know if this irq rate is too high or not? ¿how can I know if I'm closer to device limits, or kernel limits? I want to say that I'm don't know if 8000 irq per second means a high IRQ use or a lower user. I hope I've explained better at this time. -- Thanks, Jordi Espasa Clofent From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 10:41:51 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED59516A420 for ; Wed, 26 Dec 2007 10:41:50 +0000 (UTC) (envelope-from michael@staff.openaccess.org) Received: from smtp-out2.openaccess.org (smtp-out2.openaccess.org [66.114.32.175]) by mx1.freebsd.org (Postfix) with ESMTP id C810613C458 for ; Wed, 26 Dec 2007 10:41:50 +0000 (UTC) (envelope-from michael@staff.openaccess.org) Received: from smtp-nas.openaccess.org (smtp-nas.openaccess.org [66.114.32.169]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp-out2.openaccess.org (Postfix) with ESMTP id E9DF7797A6B; Wed, 26 Dec 2007 02:22:50 -0800 (PST) Received: from [192.168.2.151] (mono-sis1.s.bli.openaccess.org [66.114.32.149]) by smtp-nas.openaccess.org (Postfix) with ESMTP id 768FB61641F; Wed, 26 Dec 2007 02:22:50 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: <47722927.5000106@opengea.org> References: <4770F5BF.40100@opengea.org> <2a41acea0712260040h7ef404eby661d7eea68706209@mail.gmail.com> <47722927.5000106@opengea.org> Message-Id: <8EAFDDF1-7A7D-45C4-A25A-50A3999D9438@staff.openaccess.org> From: Michael DeMan Date: Wed, 26 Dec 2007 02:22:51 -0800 To: freebsd-net@freebsd.org, Jordi Espasa Clofent X-Mailer: Apple Mail (2.753) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 10:41:51 -0000 Hi, I think this is a really good question. I'm curious since we use a lot of stripped-down FreeBSD for modest =20 performance routers. We typically enabling our interfaces with POLLING not so much for =20 performance (it seems to be a negligible improvement nowadays) but so =20= that we know that our OSPF/BGP/SSH processes are always responsive. I'd be curious if anybody could get back on this. I've never even =20 considered things from the perspective of how many interrupts a NIC =20 could generate other than that they could always generate too many. - mike On Dec 26, 2007, at 2:12 AM, Jordi Espasa Clofent wrote: > OK, I'll try to explain in another way. > > While I've done network performance test I've monitored the IRQ =20 > rate, and, for example, it's a 7000/8000 interrupts per second in =20 > every NIC (I use 2 NICs in a bridge). The question is > > =BFhow can I know if this irq rate is too high or not? =BFhow can I =20= > know if I'm closer to device limits, or kernel limits? > > I want to say that I'm don't know if 8000 irq per second means a =20 > high IRQ use or a lower user. > > I hope I've explained better at this time. > > --=20 > Thanks, > Jordi Espasa Clofent > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 11:37:17 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13D8B16A420 for ; Wed, 26 Dec 2007 11:37:17 +0000 (UTC) (envelope-from oceanare@pacific.net.sg) Received: from smtpgate2.pacific.net.sg (smtpgate2.pacific.net.sg [203.120.90.32]) by mx1.freebsd.org (Postfix) with SMTP id 5650E13C442 for ; Wed, 26 Dec 2007 11:37:15 +0000 (UTC) (envelope-from oceanare@pacific.net.sg) Received: (qmail 10652 invoked from network); 26 Dec 2007 11:10:34 -0000 Received: from adsl172.dyn112.pacific.net.sg (HELO P2120.somewherefaraway.com) (oceanare@210.24.112.172) by smtpgate2.pacific.net.sg with ESMTPA; 26 Dec 2007 11:10:33 -0000 Message-ID: <4772369D.5020608@pacific.net.sg> Date: Wed, 26 Dec 2007 19:10:21 +0800 From: Erich Dollansky User-Agent: Thunderbird 2.0.0.6 (X11/20070826) MIME-Version: 1.0 To: Jordi Espasa Clofent References: <4770F5BF.40100@opengea.org> <2a41acea0712260040h7ef404eby661d7eea68706209@mail.gmail.com> <47722927.5000106@opengea.org> In-Reply-To: <47722927.5000106@opengea.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 11:37:17 -0000 Hi, Jordi Espasa Clofent wrote: > > I want to say that I'm don't know if 8000 irq per second means a high > IRQ use or a lower user. I must say, that I did not do hardware since some time. But 10 000 Interrupts per second is not this high. Modern CPUs should be able to handle much much more. So, the limit will be the operating system and the driver's ability to finish its job before the next interrupt comes. Enabling polling helps with slower CPUs. The maximum possible interrupt rate is given by the combination out of hardware and the CPU's ability to react, save its current status, do something and restore the previous status. This value will be pointless to you as only the developers of hardware use this information to make sure that what they plan is possible. Erich From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 12:05:09 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2600816A41A; Wed, 26 Dec 2007 12:05:09 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0BA3013C459; Wed, 26 Dec 2007 12:05:09 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBQC583k030233; Wed, 26 Dec 2007 12:05:08 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBQC58rr030224; Wed, 26 Dec 2007 12:05:08 GMT (envelope-from linimon) Date: Wed, 26 Dec 2007 12:05:08 GMT Message-Id: <200712261205.lBQC58rr030224@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/119036: [netipsec] [patch] enc(4) and dummynet together produce kernel panics X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 12:05:09 -0000 Synopsis: [netipsec] [patch] enc(4) and dummynet together produce kernel panics Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Wed Dec 26 12:04:44 UTC 2007 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=119036 From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 12:54:40 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D8D916A468 for ; Wed, 26 Dec 2007 12:54:40 +0000 (UTC) (envelope-from maf@splintered.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 30C3E13C44B for ; Wed, 26 Dec 2007 12:54:40 +0000 (UTC) (envelope-from maf@splintered.net) Received: (qmail 92089 invoked from network); 26 Dec 2007 12:54:38 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 26 Dec 2007 12:54:38 -0000 In-Reply-To: <20071225052749.GE57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> <20071224131906.GB57756@deviant.kiev.zoral.com.ua> <833223E8-B1ED-4358-A992-F3789E4B3070@splintered.net> <20071225052749.GE57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Wed, 26 Dec 2007 07:54:18 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 12:54:40 -0000 On Dec 25, 2007, at 12:27 AM, Kostik Belousov wrote: > > What fs do you use ? If FFS, are softupdates turned on ? Please, > show the > total time spent in the softdepflush process. > > Also, try to add the FULL_PREEMPTION kernel config option and report > whether it helps. FFS with soft updates on all filesystems. With your latest uio_yield() in MNT_VNODE_FOREACH patch it's a little harder to provoke packet loss. Standard nightly crontabs and a tar -cf - / > /dev/null no longer cause drops. A make buildkernel will though. root 38 0.0 0.0 0 8 ?? DL Mon08PM 0:04.62 [softdepflush] Building a new kernel with KTR and FULL_PREEMPTION now. -- mark From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 14:33:17 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 458DF16A418; Wed, 26 Dec 2007 14:33:17 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1140113C4E8; Wed, 26 Dec 2007 14:33:12 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47726624.6020805@FreeBSD.org> Date: Wed, 26 Dec 2007 15:33:08 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Mark Fullmer References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> <20071223095314.G1323@delplex.bde.org> <20071224131906.GB57756@deviant.kiev.zoral.com.ua> <833223E8-B1ED-4358-A992-F3789E4B3070@splintered.net> <20071225052749.GE57756@deviant.kiev.zoral.com.ua> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Kostik Belousov , freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 14:33:17 -0000 Mark Fullmer wrote: > > On Dec 25, 2007, at 12:27 AM, Kostik Belousov wrote: > >> >> What fs do you use ? If FFS, are softupdates turned on ? Please, show the >> total time spent in the softdepflush process. >> >> Also, try to add the FULL_PREEMPTION kernel config option and report >> whether it helps. > > FFS with soft updates on all filesystems. > > With your latest uio_yield() in MNT_VNODE_FOREACH patch it's a > little harder to provoke packet loss. Standard nightly > crontabs and a tar -cf - / > /dev/null no longer cause drops. A > make buildkernel will though. > > root 38 0.0 0.0 0 8 ?? DL Mon08PM 0:04.62 > [softdepflush] > > Building a new kernel with KTR and FULL_PREEMPTION now. FYI FULL_PREEMPTION causes performance loss in other situations. Kris From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 16:10:31 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6659916A419 for ; Wed, 26 Dec 2007 16:10:31 +0000 (UTC) (envelope-from trashy_bumper@yahoo.com) Received: from web36309.mail.mud.yahoo.com (web36309.mail.mud.yahoo.com [209.191.91.186]) by mx1.freebsd.org (Postfix) with SMTP id 35FD513C442 for ; Wed, 26 Dec 2007 16:10:31 +0000 (UTC) (envelope-from trashy_bumper@yahoo.com) Received: (qmail 86382 invoked by uid 60001); 26 Dec 2007 16:10:30 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=s5Q2+Qvuh3kkVomjpooJ7Akoc6dE4Dtd2WSmdVImSOj0ulPsRGLDtqQ29tQfecv/CMBZnn6RTGkx4EwHmEMiyAuZuu/aii1V18OEuDT0V6WdqlACImJ/N2+KJLIMjDFGrl8W0ztzYStt7heVi1PGryxD8pZoY8nLCnsJtnUfWrg=; X-YMail-OSG: MxIAaakVM1lFRmJui_IDV1GBpFWCuriXZ.f8zH1Pe6SfcPDcz.xbBZMuZd1DnVZtsBMVhGIlnvIyKpHtfQxpz5u9umX6kznoAnHEjljcGoZUnk4pgcc- Received: from [77.122.205.244] by web36309.mail.mud.yahoo.com via HTTP; Wed, 26 Dec 2007 08:10:30 PST X-Mailer: YahooMailRC/818.31 YahooMailWebService/0.7.158.1 Date: Wed, 26 Dec 2007 08:10:30 -0800 (PST) From: Nash Nipples To: freebsd-net@freebsd.org MIME-Version: 1.0 Message-ID: <508610.85778.qm@web36309.mail.mud.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 16:10:31 -0000 Dear Jordi,=0A=0AIn theory, on a Gigabit link you get 1 000 000 000 bits * = second.=0ABy default you have the MTU set to 1500 bytes which makes ~12 000= bits.=0A1 000 000 000 / 12 000 =3D ~ 83 333 packets per second.=0A83 333 p= ackets per second makes 0.083333 packets per microsecond.=0A1 / 0.08333 =3D= 12.0 microseconds per packet. Thus one can interrupt CPU=0Aat a rate of ~8= 3 333 times per second. If you use lower packets sizes you =0Amight get eve= n more funny numbers.=0A=0A8000 is a quiet low number. The driver was devel= oped by guys=0Aat Intel. I don't see a reason to worry. =0A=0ABy the way th= ey have products with Interrupt Moderation.=0Ahttp://www.intel.com/design/n= etwork/applnots/ap450.htm=0A=0AThe question is really amazing. Thanks, it h= ave tickled me big time.=0A=0ASincerely,=0A=0ANash=0A=0A----- Original Mess= age ----=0AFrom: Jordi Espasa Clofent =0ATo: free= bsd-net@freebsd.org=0ASent: Wednesday, December 26, 2007 12:12:55 PM=0ASubj= ect: Re: Maximum NIC interrupts=0A=0A=0AOK, I'll try to explain in another = way.=0A=0AWhile I've done network performance test I've monitored the IRQ r= ate, =0Aand, for example, it's a 7000/8000 interrupts per second in every N= IC=0A (I =0Ause 2 NICs in a bridge). The question is=0A=0A=BFhow can I know= if this irq rate is too high or not? =BFhow can I know=0A if =0AI'm closer= to device limits, or kernel limits?=0A=0AI want to say that I'm don't know= if 8000 irq per second means a high =0AIRQ use or a lower user.=0A=0AI hop= e I've explained better at this time.=0A=0A-- =0AThanks,=0AJordi Espasa Clo= fent=0A_______________________________________________=0Afreebsd-net@freebs= d.org mailing list=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-net= =0ATo unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"= =0A=0A=0A=0A=0A=0A=0A ________________________________________________= ____________________________________=0ANever miss a thing. Make Yahoo your= home page. =0Ahttp://www.yahoo.com/r/hs From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 17:06:10 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 068AE16A46B; Wed, 26 Dec 2007 17:06:10 +0000 (UTC) (envelope-from thompsa@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DF43813C448; Wed, 26 Dec 2007 17:06:09 +0000 (UTC) (envelope-from thompsa@FreeBSD.org) Received: from freefall.freebsd.org (thompsa@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBQH69vf049376; Wed, 26 Dec 2007 17:06:09 GMT (envelope-from thompsa@freefall.freebsd.org) Received: (from thompsa@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBQH690Q049372; Wed, 26 Dec 2007 17:06:09 GMT (envelope-from thompsa) Date: Wed, 26 Dec 2007 17:06:09 GMT Message-Id: <200712261706.lBQH690Q049372@freefall.freebsd.org> To: thompsa@FreeBSD.org, freebsd-net@FreeBSD.org, thompsa@FreeBSD.org From: thompsa@FreeBSD.org Cc: Subject: Re: kern/119036: [netipsec] [patch] enc(4) and dummynet together produce kernel panics X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 17:06:10 -0000 Synopsis: [netipsec] [patch] enc(4) and dummynet together produce kernel panics Responsible-Changed-From-To: freebsd-net->thompsa Responsible-Changed-By: thompsa Responsible-Changed-When: Wed Dec 26 17:02:25 UTC 2007 Responsible-Changed-Why: I'll grab this one. I was forwarded the local patch from the m0n0wall repo a few days ago and have fixed it in a slightly differenet way, see r1.8 src/sys/net/if_enc.c Thanks for finding/fixing it and the PR. http://www.freebsd.org/cgi/query-pr.cgi?pr=119036 From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 18:35:11 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F52416A418; Wed, 26 Dec 2007 18:35:11 +0000 (UTC) (envelope-from jcmichot@flash.usenet-fr.net) Received: from flash.usenet-fr.net (flash.usenet-fr.net [88.174.64.5]) by mx1.freebsd.org (Postfix) with ESMTP id BE7E013C474; Wed, 26 Dec 2007 18:35:10 +0000 (UTC) (envelope-from jcmichot@flash.usenet-fr.net) Received: by flash.usenet-fr.net (Postfix, from userid 200) id 9C79195AE3; Wed, 26 Dec 2007 19:03:27 +0100 (CET) Date: Wed, 26 Dec 2007 19:03:27 +0100 From: Jean-Claude MICHOT To: freebsd-performance@freebsd.org, freebsd-net@freebsd.org Message-ID: <20071226180327.GA39735@flash.usenet-fr.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Cc: Subject: DELL PowerEdge 860 and Broadcom Gigabit Ethernet poor performance. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 18:35:11 -0000 The server is a DELL PowerEdge 860 freshly installed with FreeBSD 7.0-BETA4 (GENERIC Kernel). pciconf and part of boot information: bge0@pci0:4:0:0: class=0x020000 card=0x01e61028 chip=0x165914e4 rev=0x11 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5721 NetXtreme Gigabit Ethernet PCI Express' class = network subclass = ethernet bge1@pci0:5:0:0: class=0x020000 card=0x01e61028 chip=0x165914e4 rev=0x11 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5721 NetXtreme Gigabit Ethernet PCI Express' class = network subclass = ethernet bge0: mem 0xfe5f0000-0xfe5fffff irq 16 at device 0.0 on pci4 miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:1c:23:e1:78:7e bge0: [ITHREAD] bge1: mem 0xfe3f0000-0xfe3fffff irq 17 at device 0.0 on pci5 miibus1: on bge1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:1c:23:e1:78:7f bge1: [ITHREAD] There's no problem with input throughput (upto 980 Mbits) but output throughput never go upper to 540 Mbits :( Just for test, i have add to this server an Intel Gigabit Ethernet board (em) and there's no problem to output data up to around 980 Mbits with this addon board. If i boot the server with Linux Ubuntu, there's no output throughput problem with Broadcom, so it seem to be FreeBSD bge driver problem. I'm not the only one to have poor output performance with bge on DELL PowerEdge 860 http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014373.html I have also try various patch or setup driver default values http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015951.html http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015956.html But all theses attempts to get better ouput performance was unsuccessful :( Any idea are welcome. PS: If it's usefull to debug and try to fix the problem, i can provide root access to a DELL PE860 test server with bge. -- UNIX is user-friendly. it's just selective about who its friends are. From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 19:50:56 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3B8B16A421 for ; Wed, 26 Dec 2007 19:50:56 +0000 (UTC) (envelope-from netslists@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.244]) by mx1.freebsd.org (Postfix) with ESMTP id 840B313C44B for ; Wed, 26 Dec 2007 19:50:56 +0000 (UTC) (envelope-from netslists@gmail.com) Received: by an-out-0708.google.com with SMTP id c14so531958anc.13 for ; Wed, 26 Dec 2007 11:50:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=wSuie2gWE8WpbKCkvIWWJxklWfv+MIevEV0YUTyLceE=; b=w5BSizMm7b9heSWMHD49ojencmMvKmrgFgpI72A+pZsEj8h+G+L5jg7MFb8uESJpmhHuLqZAUwnL0G09kvn6GyttxW0ArRxre3FZMc6E+r4M/xdrj8RSHm9kSN7F4x710YiB1qVd+dlf9SXE8qc6xnRk3GTlHHXpIhBLjRo9yQU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=aKtKby+FDlXQh9dMas5D50q1tHtlBIBU1H8v/lHgBNTO1w+WqlyIsX6QUsdXZm4d6g/TqenFfjMTl1gfOO+B51B6Ktffvs1FHeWP2K+ammsKw8m3sdECbbE4o3pgKSj3Y48G9H8eSlgpzvUllMDxG2zNSSkzBnDxW+jmSOA0oNo= Received: by 10.100.110.15 with SMTP id i15mr14674910anc.76.1198697087677; Wed, 26 Dec 2007 11:24:47 -0800 (PST) Received: from ?192.168.12.8? ( [97.101.40.241]) by mx.google.com with ESMTPS id c40sm7839142anc.16.2007.12.26.11.24.46 (version=SSLv3 cipher=RC4-MD5); Wed, 26 Dec 2007 11:24:46 -0800 (PST) Message-ID: <4772AA7C.1020206@gmail.com> Date: Wed, 26 Dec 2007 14:24:44 -0500 From: Sten Daniel Soersdal User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Jean-Claude MICHOT References: <20071226180327.GA39735@flash.usenet-fr.net> In-Reply-To: <20071226180327.GA39735@flash.usenet-fr.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: DELL PowerEdge 860 and Broadcom Gigabit Ethernet poor performance. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 19:50:56 -0000 Jean-Claude MICHOT wrote: > The server is a DELL PowerEdge 860 freshly installed with > FreeBSD 7.0-BETA4 (GENERIC Kernel). > > pciconf and part of boot information: > > bge0@pci0:4:0:0: class=0x020000 card=0x01e61028 chip=0x165914e4 rev=0x11 hdr=0x00 > vendor = 'Broadcom Corporation' > device = 'BCM5721 NetXtreme Gigabit Ethernet PCI Express' > class = network > subclass = ethernet > bge1@pci0:5:0:0: class=0x020000 card=0x01e61028 chip=0x165914e4 rev=0x11 hdr=0x00 > vendor = 'Broadcom Corporation' > device = 'BCM5721 NetXtreme Gigabit Ethernet PCI Express' > class = network > subclass = ethernet > > bge0: mem 0xfe5f0000-0xfe5fffff irq 16 at device 0.0 on pci4 > miibus0: on bge0 > brgphy0: PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > bge0: Ethernet address: 00:1c:23:e1:78:7e > bge0: [ITHREAD] > bge1: mem 0xfe3f0000-0xfe3fffff irq 17 at device 0.0 on pci5 > miibus1: on bge1 > brgphy1: PHY 1 on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > bge1: Ethernet address: 00:1c:23:e1:78:7f > bge1: [ITHREAD] > > There's no problem with input throughput (upto 980 Mbits) but output > throughput never go upper to 540 Mbits :( > > Just for test, i have add to this server an Intel Gigabit Ethernet board > (em) and there's no problem to output data up to around 980 Mbits with > this addon board. > > If i boot the server with Linux Ubuntu, there's no output throughput > problem with Broadcom, so it seem to be FreeBSD bge driver problem. > > I'm not the only one to have poor output performance with bge on > DELL PowerEdge 860 > http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014373.html > > I have also try various patch or setup driver default values > http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015951.html > http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015956.html > > But all theses attempts to get better ouput performance was unsuccessful :( > > Any idea are welcome. > > PS: If it's usefull to debug and try to fix the problem, i can provide > root access to a DELL PE860 test server with bge. > Have you tried setting the tcp send and receive windows? The defaults are: net.inet.tcp.sendspace: 32768 net.inet.tcp.recvspace: 65536 Also you might want to try to lower: net.inet.tcp.delacktime: 100 -- Sten Daniel Soersdal From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 21:08:58 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEAA516A418 for ; Wed, 26 Dec 2007 21:08:58 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.158]) by mx1.freebsd.org (Postfix) with ESMTP id 66ED213C44B for ; Wed, 26 Dec 2007 21:08:58 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so1846702fgg.35 for ; Wed, 26 Dec 2007 13:08:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=D7RlBniS6Jtn/1wRHqxBoQxcX7BcbUwLvy744DuB2T0=; b=GcTpd3ldUwTryAbrl038YIf2qAukVRlFfm2DL3RHb/b/Kf04AP1iH6rwmhoqokBLf9QArbbeL4kzzyMwbbH9IexPjiLtGJgA8rlHbFnHe2I+1lbp+WJ1JlfUObmb97YPdc1v8fAG3hpT7Yxrv2Oi27ciAWkJki7e5g3Qu4k4kUk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=oVvJSOvFo55heG8R3Lx2bi4IwUEf8rt/7ERmNQYMTtTzjc5jwZXQuhUG4dkoJUX8NbPe0pqGRULEm5vTKs1q0hc5wU4IZG5eUIS90EszfwOGem734vv2PiIr1Pi+FtFGI4r0Iur+U1ovwulXjNQedXW7dJ78GqKDMrRDzVzkIek= Received: by 10.86.84.5 with SMTP id h5mr7045465fgb.75.1198703337058; Wed, 26 Dec 2007 13:08:57 -0800 (PST) Received: by 10.86.97.10 with HTTP; Wed, 26 Dec 2007 13:08:57 -0800 (PST) Message-ID: <2a41acea0712261308y7ac8d831i4e02c7849bbca3f6@mail.gmail.com> Date: Wed, 26 Dec 2007 13:08:57 -0800 From: "Jack Vogel" To: "Nash Nipples" In-Reply-To: <508610.85778.qm@web36309.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <508610.85778.qm@web36309.mail.mud.yahoo.com> Cc: freebsd-net@freebsd.org Subject: Re: Maximum NIC interrupts X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 21:08:58 -0000 On Dec 26, 2007 8:10 AM, Nash Nipples wrote: > Dear Jordi, > > In theory, on a Gigabit link you get 1 000 000 000 bits * second. > By default you have the MTU set to 1500 bytes which makes ~12 000 bits. > 1 000 000 000 / 12 000 = ~ 83 333 packets per second. > 83 333 packets per second makes 0.083333 packets per microsecond. > 1 / 0.08333 = 12.0 microseconds per packet. Thus one can interrupt CPU > at a rate of ~83 333 times per second. If you use lower packets sizes you > might get even more funny numbers. > > 8000 is a quiet low number. The driver was developed by guys > at Intel. I don't see a reason to worry. > > By the way they have products with Interrupt Moderation. > http://www.intel.com/design/network/applnots/ap450.htm Yes, one of the items in my queue is AIM, adaptive interrupt moderation, the Linux driver has this, my coworker Jesse Brandeburg developed that code, and I hope to do something similar for the em driver. Anyway, I'm still on vacation and don't want to distract myself from music, but look for AIM sometime this new year :) Jack From owner-freebsd-net@FreeBSD.ORG Wed Dec 26 23:48:35 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 703AC16A46B for ; Wed, 26 Dec 2007 23:48:35 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outR.internet-mail-service.net (outR.internet-mail-service.net [216.240.47.241]) by mx1.freebsd.org (Postfix) with ESMTP id 57C9713C44B for ; Wed, 26 Dec 2007 23:48:35 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 26 Dec 2007 15:48:34 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id CC2C9126D82; Wed, 26 Dec 2007 15:48:33 -0800 (PST) Message-ID: <4772E859.3090005@elischer.org> Date: Wed, 26 Dec 2007 15:48:41 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: FreeBSD Net , arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Li, Qing" , Robert Watson Subject: multiple routing tables roadmap X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Dec 2007 23:48:35 -0000 On thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons. Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. Implementation method, (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not yet caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called dom_rtalloc() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). you CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -n 3 ping target.example.com # will use fib 3 for ping. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). routing messages would be associated with their process, and thus select one FIB or another. In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. I have not yet added the changes to ipfw. pf has some similar changes already but they seem to rely on the various FIBs having symbolic names. Which I do not plan to support in the first verion of these changes. SCTP has interestingly enough buiold in support for this, called VRFs in cisco parlance. it will be intersting to see how that handles it when it suddenly actually does something. I have not redone my testing since my last edits, but will be retesting with the current code asap. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some rototilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. diffs for those with p4 access: p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 //depot/user/julian/routing/src/sys/... for those with the makediff perl script: perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 //depot/user/julian/routing/src/sys/... for those with neither: http://people.freebsd.org/~julian/mrt2.diff I just put the userland utility in usr.sbin/setfib/ in p4. and changes to netstat in usr.bin/netstat/ see: http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO I'd like to get comments on this (compat) version, so that I can commit it, get general testing under way to start the clock for MFC, and then get moving on the fuller implementation (that breaks ABIs) and other routing issues. Julian From owner-freebsd-net@FreeBSD.ORG Thu Dec 27 00:26:05 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3BFD16A46C for ; Thu, 27 Dec 2007 00:26:05 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outW.internet-mail-service.net (outW.internet-mail-service.net [216.240.47.246]) by mx1.freebsd.org (Postfix) with ESMTP id CAE3813C474 for ; Thu, 27 Dec 2007 00:26:05 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 26 Dec 2007 16:26:04 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 235D4126D8C; Wed, 26 Dec 2007 16:26:04 -0800 (PST) Message-ID: <4772F123.5030303@elischer.org> Date: Wed, 26 Dec 2007 16:26:11 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: FreeBSD Net , arch@freebsd.org, Robert Watson , Qing Li Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2007 00:26:06 -0000 Resending as my mailer made a dog's breakfast of the first one with all sorts of wierd line breaks... hopefully this will be better. (I haven't sent it yet so I'm hoping).. ------------------------------------------- On thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons. Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. Implementation method, (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not yet caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called dom_rtalloc() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -n 3 ping target.example.com # will use fib 3 for ping. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). Routing messages would be associated with their process, and thus select one FIB or another. In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. I have not yet added the changes to ipfw. pf has some similar changes already but they seem to rely on the various FIBs having symbolic names. Which I do not plan to support in the first version of these changes. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. I have not redone my testing since my last edits, but will be retesting with the current code asap. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. diffs for those with p4 access: p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 //depot/user/julian/routing/src/sys/... for those with the makediff perl script: perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 //depot/user/julian/routing/src/sys/... for those with neither: http://people.freebsd.org/~julian/mrt2.diff I just put the userland utility in usr.sbin/setfib/ in p4. and changes to netstat in usr.bin/netstat/ see: http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO I'd like to get comments on this (compat) version, so that I can commit it, get general testing under way to start the clock for MFC, and then get moving on the fuller implementation (that breaks ABIs) and other routing issues. Julian From owner-freebsd-net@FreeBSD.ORG Thu Dec 27 01:18:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A48B316A419; Thu, 27 Dec 2007 01:18:52 +0000 (UTC) (envelope-from jcmichot@flash.usenet-fr.net) Received: from flash.usenet-fr.net (flash.usenet-fr.net [88.174.64.5]) by mx1.freebsd.org (Postfix) with ESMTP id 5B2A313C469; Thu, 27 Dec 2007 01:18:52 +0000 (UTC) (envelope-from jcmichot@flash.usenet-fr.net) Received: by flash.usenet-fr.net (Postfix, from userid 200) id D1E6895BD3; Thu, 27 Dec 2007 02:18:50 +0100 (CET) Date: Thu, 27 Dec 2007 02:18:50 +0100 From: Jean-Claude MICHOT To: Sten Daniel Soersdal Message-ID: <20071227011850.GA43415@flash.usenet-fr.net> References: <20071226180327.GA39735@flash.usenet-fr.net> <4772AA7C.1020206@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4772AA7C.1020206@gmail.com> User-Agent: Mutt/1.4.2.2i Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Jean-Claude MICHOT Subject: Re: DELL PowerEdge 860 and Broadcom Gigabit Ethernet poor performance. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2007 01:18:52 -0000 On Wed, Dec 26, 2007 at 02:24:44PM -0500, Sten Daniel Soersdal said: > Jean-Claude MICHOT wrote: > >The server is a DELL PowerEdge 860 freshly installed with > >FreeBSD 7.0-BETA4 (GENERIC Kernel). > > > >There's no problem with input throughput (upto 980 Mbits) but output > >throughput never go upper to 540 Mbits :( > > > >Just for test, i have add to this server an Intel Gigabit Ethernet board > >(em) and there's no problem to output data up to around 980 Mbits with > >this addon board. > > > >If i boot the server with Linux Ubuntu, there's no output throughput > >problem with Broadcom, so it seem to be FreeBSD bge driver problem. > > > >I'm not the only one to have poor output performance with bge on > >DELL PowerEdge 860 > >http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014373.html > > > >I have also try various patch or setup driver default values > >http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015951.html > >http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015956.html > > > >But all theses attempts to get better ouput performance was unsuccessful :( > > > >Any idea are welcome. > > > >PS: If it's usefull to debug and try to fix the problem, i can provide > > root access to a DELL PE860 test server with bge. > > Have you tried setting the tcp send and receive windows? > The defaults are: > > net.inet.tcp.sendspace: 32768 > net.inet.tcp.recvspace: 65536 > > Also you might want to try to lower: > > net.inet.tcp.delacktime: 100 Yes, no difference. But as i said, same freebsd "config" with Intel em board instead of Broadcom bge, and there's no output throuhgput problem. JC -- "Those people who think they know everything are a great annoyance to those of us who do." Isaac Asimov From owner-freebsd-net@FreeBSD.ORG Thu Dec 27 01:28:04 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17CBA16A419 for ; Thu, 27 Dec 2007 01:28:04 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: from hs-out-2122.google.com (hs-out-0708.google.com [64.233.178.240]) by mx1.freebsd.org (Postfix) with ESMTP id 669D513C447 for ; Thu, 27 Dec 2007 01:28:03 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: by hs-out-2122.google.com with SMTP id j58so2219948hsj.11 for ; Wed, 26 Dec 2007 17:28:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=7P9byjzIODk7OjOMT/p8QTPLAQm1t3t9xirlo8LCWbQ=; b=PgbD/339ykW1ubzBfVxjCIypmFOJpiEuqCKRGhdWgsw4+US0lzwwBV0DhQVivKkvIZtVJ6eFnugyyE4NuAJzoWI4YwWTKinWwYWo50sCifDyb0/ifRIp0XD1kxj6XbigSIwz0OwRp4YLbDkx2j2jKyU0m6I+Rs/v1zIAQc3/pZI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=jG1MNPOqYXnEan6K0C/XRO7sZWyrhrffthB7BWg91ZXxxcCOUwrQ92wpaUVByN3NI9ROYbPLyfbsz5yu80Soq4JzBAwu/9++82q3m09H0xmcdPj7tHFPm8aU3KrWD5Z7ELHXAACt8viEkrRAkrgmfhyS1ZfbAZkkBaNvbHlq2gE= Received: by 10.150.229.16 with SMTP id b16mr1961399ybh.115.1198718881884; Wed, 26 Dec 2007 17:28:01 -0800 (PST) Received: by 10.150.204.13 with HTTP; Wed, 26 Dec 2007 17:28:01 -0800 (PST) Message-ID: Date: Thu, 27 Dec 2007 03:28:01 +0200 From: "Ivo Vachkov" To: "Julian Elischer" In-Reply-To: <4772F123.5030303@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4772F123.5030303@elischer.org> Cc: FreeBSD Net , Robert Watson , Qing Li , arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2007 01:28:04 -0000 On Dec 27, 2007 2:26 AM, Julian Elischer wrote: > Resending as my mailer made a dog's breakfast of the first one > with all sorts of wierd line breaks... hopefully this will be better. > (I haven't sent it yet so I'm hoping).. > > > ------------------------------------------- > > > > On thing where FreeBSD has been falling behind, and which by chance I > have some time to work on is "policy based routing", which allows > different > packet streams to be routed by more than just the destination address. > > Constraints: > ------------ > > I want to make some form of this available in the 6.x tree > (and by extension 7.x) , but FreeBSD in general needs it so I might as > well > do it in -current and back port the portions I need. > > One of the ways that this can be done is to have the ability to > instantiate multiple kernel routing tables (which I will now > refer to as "Forwarding Information Bases" or "FIBs" for political > correctness reasons. Which FIB a particular packet uses to make > the next hop decision can be decided by a number of mechanisms. > The policies these mechanisms implement are the "Policies" referred > to in "Policy based routing". > > One of the constraints I have if I try to back port this work to > 6.x is that it must be implemented as a EXTENSION to the existing > ABIs in 6.x so that third party applications do not need to be > recompiled in timespan of the branch. > > Implementation method, (part 1) > ------------------------------- > For this reason I have implemented a "sufficient subset" of a > multiple routing table solution in Perforce, and back-ported it > to 6.x. (also in Perforce though not yet caught up with what I > have done in -current/P4). The subset allows a number of FIBs > to be defined at compile time (sufficient for my purposes in 6.x) and > implements the changes needed to allow IPV4 to use them. I have not done > the changes for ipv6 simply because I do not need it, and I do not > have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. > > Other protocol families are left untouched and should there be > users with proprietary protocol families, they should continue to work > and be oblivious to the existence of the extra FIBs. > > To understand how this is done, one must know that the current FIB > code starts everything off with a single dimensional array of > pointers to FIB head structures (One per protocol family), each of > which in turn points to the trie of routes available to that family. > > The basic change in the ABI compatible version of the change is to > extent that array to be a 2 dimensional array, so that > instead of protocol family X looking at rt_tables[X] for the > table it needs, it looks at rt_tables[Y][X] when for all > protocol families except ipv4 Y is always 0. > Code that is unaware of the change always just sees the first row > of the table, which of course looks just like the one dimensional > array that existed before. Pretty much like the OpenBSD approach :) > The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() > are all maintained, but refer only to the first row of the array, > so that existing callers in proprietary protocols can continue to > do the "right thing". > Some new entry points are added, for the exclusive use of ipv4 code > called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), > which have an extra argument which refers the code to the correct row. > > In addition, there are some new entry points (currently called > dom_rtalloc() and friends) that check the Address family being > looked up and call either rtalloc() (and friends) if the protocol > is not IPv4 forcing the action to row 0 or to the appropriate row > if it IS IPv4 (and that info is available). These are for calling > from code that is not specific to any particular protocol. The way > these are implemented would change in the non ABI preserving code > to be added later. > > One feature of the first version of the code is that for ipv4, > the interface routes show up automatically on all the FIBs, so > that no matter what FIB you select you always have the basic > direct attached hosts available to you. (rtinit() does this > automatically). > You CAN delete an interface route from one FIB should you want > to but by default it's there. ARP information is also available > in each FIB. It's assumed that the same machine would have the > same MAC address, regardless of which FIB you are using to get > to it. > > > This brings us as to how the correct FIB is selected for an outgoing > IPV4 packet. > > Packets fall into one of a number of classes. > 1/ locally generated packets, coming from a socket/PCB. > Such packets select a FIB from a number associated with the > socket/PCB. This in turn is inherited from the process, > but can be changed by a socket option. The process in turn > inherits it on fork. I have written a utility call setfib > that acts a bit like nice.. > > setfib -n 3 ping target.example.com # will use fib 3 for ping. > > 2/ packets received on an interface for forwarding. > By default these packets would use table 0, > (or possibly a number settable in a sysctl(not yet)). > but prior to routing the firewall can inspect them (see below). > > 3/ packets inspected by a packet classifier, which can arbitrarily > associate a fib with it on a packet by packet basis. > A fib assigned to a packet by a packet classifier > (such as ipfw) would over-ride a fib associated by > a more default source. (such as cases 1 or 2). For the 2/ and 3/ cases I added (in a personal work i've been doing lately) additional field in struct mbuf which can be set by a packet filter or other application upon receiving which points the right table to use for the lookup. This way a simple "marking" can be used to divide different flows and create policy based routing. > Routing messages would be associated with their > process, and thus select one FIB or another. > > In addition Netstat has been edited to be able to cope with the > fact that the array is now 2 dimensional. (It looks in system > memory using libkvm (!)). > > In addition two sysctls are added to give: > a) the number of FIBs compiled in (active) > b) the default FIB of the calling process. > > Early testing experience: > ------------------------- > > Basically our (IronPort's) appliance does this functionality already > using ipfw fwd but that method has some drawbacks. > > For example, > It can't fully simulate a routing table because it can't influence the > socket's choice of local address when a connect() is done. > > > Testing during the generating of these changes has been > remarkably smooth so far. Multiple tables have co-existed > with no notable side effects, and packets have been routes > accordingly. > > I have not yet added the changes to ipfw. > pf has some similar changes already but they seem to rely on > the various FIBs having symbolic names. Which I do not plan to support > in the first version of these changes. > > SCTP has interestingly enough built in support for this, called VRFs > in Cisco parlance. it will be interesting to see how that handles it > when it suddenly actually does something. > > I have not redone my testing since my last edits, but will be > retesting with the current code asap. > > > Where to next: > -------------------- > > After committing the ABI compatible version and MFCing it, I'd > like to proceed in a forward direction in -current. this will > result in some roto-tilling in the routing code. > > Firstly: the current code's idea of having a separate tree per > protocol family, all of the same format, and pointed to by the > 1 dimensional array is a bit silly. Especially when one considers that > there > is code that makes assumptions about every protocol having the same > internal structures there. Some protocols don't WANT that > sort of structure. (for example the whole idea of a netmask is foreign > to appletalk). This needs to be made opaque to the external code. > > My suggested first change is to add routing method pointers to the > 'domain' structure, along with information pointing the data. > instead of having an array of pointers to uniform structures, > there would be an array pointing to the 'domain' structures > for each protocol address domain (protocol family), > and the methods this reached would be called. The methods would have > an argument that gives FIB number, but the protocol would be free > to ignore it. > > Interaction with the ARP layer/ LL layer would need to be > revisited as well. Qing Li has been working on this already. > > > diffs > for those with p4 access: > p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 > //depot/user/julian/routing/src/sys/... > > for those with the makediff perl script: > perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 > //depot/user/julian/routing/src/sys/... > > for those with neither: > > http://people.freebsd.org/~julian/mrt2.diff > > I just put the userland utility in usr.sbin/setfib/ in p4. > and changes to netstat in usr.bin/netstat/ > > see: > http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO > > > > > I'd like to get comments on this (compat) version, so that I can > commit it, > get general testing under way to start the clock for MFC, and then get > moving on the fuller implementation (that breaks ABIs) and other > routing issues. > > > Julian > > > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Thu Dec 27 21:19:02 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B819D16A419 for ; Thu, 27 Dec 2007 21:19:02 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outL.internet-mail-service.net (outL.internet-mail-service.net [216.240.47.235]) by mx1.freebsd.org (Postfix) with ESMTP id 92BF113C467 for ; Thu, 27 Dec 2007 21:19:02 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 27 Dec 2007 13:19:01 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 75B50126D9D; Thu, 27 Dec 2007 13:19:00 -0800 (PST) Message-ID: <477416CC.4090906@elischer.org> Date: Thu, 27 Dec 2007 13:19:08 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Ivo Vachkov References: <4772F123.5030303@elischer.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Robert Watson , Qing Li , arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2007 21:19:02 -0000 Ivo Vachkov wrote: > On Dec 27, 2007 2:26 AM, Julian Elischer wrote: >> Resending as my mailer made a dog's breakfast of the first one >> with all sorts of wierd line breaks... hopefully this will be better. >> (I haven't sent it yet so I'm hoping).. >> >> >> ------------------------------------------- >> >> >> >> On thing where FreeBSD has been falling behind, and which by chance I >> have some time to work on is "policy based routing", which allows >> different >> packet streams to be routed by more than just the destination address. >> >> Constraints: >> ------------ >> >> I want to make some form of this available in the 6.x tree >> (and by extension 7.x) , but FreeBSD in general needs it so I might as >> well >> do it in -current and back port the portions I need. >> >> One of the ways that this can be done is to have the ability to >> instantiate multiple kernel routing tables (which I will now >> refer to as "Forwarding Information Bases" or "FIBs" for political >> correctness reasons. Which FIB a particular packet uses to make >> the next hop decision can be decided by a number of mechanisms. >> The policies these mechanisms implement are the "Policies" referred >> to in "Policy based routing". >> >> One of the constraints I have if I try to back port this work to >> 6.x is that it must be implemented as a EXTENSION to the existing >> ABIs in 6.x so that third party applications do not need to be >> recompiled in timespan of the branch. >> >> Implementation method, (part 1) >> ------------------------------- >> For this reason I have implemented a "sufficient subset" of a >> multiple routing table solution in Perforce, and back-ported it >> to 6.x. (also in Perforce though not yet caught up with what I >> have done in -current/P4). The subset allows a number of FIBs >> to be defined at compile time (sufficient for my purposes in 6.x) and >> implements the changes needed to allow IPV4 to use them. I have not done >> the changes for ipv6 simply because I do not need it, and I do not >> have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. By the way, I might add that in the 6.x compat. version I may end up limiting the feature to 8 tables. This is because I need to store some stuff in an efficient way in the mbuf, and in a compatible manner this is easiest done by stealing the top 4 bits in the mbuf dlags word and defining them as: #define M_HAVEFIB 0x10000000 #define M_FIBMASK 0x07 #define M_FIBNUM 0xe0000000 #define M_FIBSHIFT 29 #define m_getfib(_m, _default) ((m->m_flags & M_HAVE_FIBNUM) ? ((m->m_flags >> M_FIBSHIFT) & M_FIBMASK) : _default) #M_SETFIB(_m, _fib) do { \ _m->m_flags &= ~M_FIBNUM; \ _m->m_flags |= (M_HAVEFIB|((_fib & M_FIBMASK) << M_FIBSHIFT));\ } while (0) This then becomes very easy to change to use a tag or whatever is needed in later versions , and the number can be expanded past 8 predefined FIBs at that time.. >> >> Other protocol families are left untouched and should there be >> users with proprietary protocol families, they should continue to work >> and be oblivious to the existence of the extra FIBs. >> >> To understand how this is done, one must know that the current FIB >> code starts everything off with a single dimensional array of >> pointers to FIB head structures (One per protocol family), each of >> which in turn points to the trie of routes available to that family. >> >> The basic change in the ABI compatible version of the change is to >> extent that array to be a 2 dimensional array, so that >> instead of protocol family X looking at rt_tables[X] for the >> table it needs, it looks at rt_tables[Y][X] when for all >> protocol families except ipv4 Y is always 0. >> Code that is unaware of the change always just sees the first row >> of the table, which of course looks just like the one dimensional >> array that existed before. > > Pretty much like the OpenBSD approach :) well, I did look at the code briefly, but I didn't base it on it.. > >> The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() >> are all maintained, but refer only to the first row of the array, >> so that existing callers in proprietary protocols can continue to >> do the "right thing". >> Some new entry points are added, for the exclusive use of ipv4 code >> called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), >> which have an extra argument which refers the code to the correct row. >> >> In addition, there are some new entry points (currently called >> dom_rtalloc() and friends) that check the Address family being >> looked up and call either rtalloc() (and friends) if the protocol >> is not IPv4 forcing the action to row 0 or to the appropriate row >> if it IS IPv4 (and that info is available). These are for calling >> from code that is not specific to any particular protocol. The way >> these are implemented would change in the non ABI preserving code >> to be added later. >> >> One feature of the first version of the code is that for ipv4, >> the interface routes show up automatically on all the FIBs, so >> that no matter what FIB you select you always have the basic >> direct attached hosts available to you. (rtinit() does this >> automatically). >> You CAN delete an interface route from one FIB should you want >> to but by default it's there. ARP information is also available >> in each FIB. It's assumed that the same machine would have the >> same MAC address, regardless of which FIB you are using to get >> to it. >> >> >> This brings us as to how the correct FIB is selected for an outgoing >> IPV4 packet. >> >> Packets fall into one of a number of classes. >> 1/ locally generated packets, coming from a socket/PCB. >> Such packets select a FIB from a number associated with the >> socket/PCB. This in turn is inherited from the process, >> but can be changed by a socket option. The process in turn >> inherits it on fork. I have written a utility call setfib >> that acts a bit like nice.. >> >> setfib -n 3 ping target.example.com # will use fib 3 for ping. >> >> 2/ packets received on an interface for forwarding. >> By default these packets would use table 0, >> (or possibly a number settable in a sysctl(not yet)). >> but prior to routing the firewall can inspect them (see below). >> >> 3/ packets inspected by a packet classifier, which can arbitrarily >> associate a fib with it on a packet by packet basis. >> A fib assigned to a packet by a packet classifier >> (such as ipfw) would over-ride a fib associated by >> a more default source. (such as cases 1 or 2). > > For the 2/ and 3/ cases I added (in a personal work i've been doing > lately) additional field in struct mbuf which can be set by a packet > filter or other application upon receiving which points the right > table to use for the lookup. This way a simple "marking" can be used > to divide different flows and create policy based routing. This would be the final way but I want to really minimise problems in the compat versions, so I'll avoid doing that for now. Do you have this work available? And have you looked at mi diffs below? > >> Routing messages would be associated with their >> process, and thus select one FIB or another. >> >> In addition Netstat has been edited to be able to cope with the >> fact that the array is now 2 dimensional. (It looks in system >> memory using libkvm (!)). >> >> In addition two sysctls are added to give: >> a) the number of FIBs compiled in (active) >> b) the default FIB of the calling process. >> >> Early testing experience: >> ------------------------- >> >> Basically our (IronPort's) appliance does this functionality already >> using ipfw fwd but that method has some drawbacks. >> >> For example, >> It can't fully simulate a routing table because it can't influence the >> socket's choice of local address when a connect() is done. >> >> >> Testing during the generating of these changes has been >> remarkably smooth so far. Multiple tables have co-existed >> with no notable side effects, and packets have been routes >> accordingly. >> >> I have not yet added the changes to ipfw. >> pf has some similar changes already but they seem to rely on >> the various FIBs having symbolic names. Which I do not plan to support >> in the first version of these changes. >> >> SCTP has interestingly enough built in support for this, called VRFs >> in Cisco parlance. it will be interesting to see how that handles it >> when it suddenly actually does something. >> >> I have not redone my testing since my last edits, but will be >> retesting with the current code asap. >> >> >> Where to next: >> -------------------- >> >> After committing the ABI compatible version and MFCing it, I'd >> like to proceed in a forward direction in -current. this will >> result in some roto-tilling in the routing code. >> >> Firstly: the current code's idea of having a separate tree per >> protocol family, all of the same format, and pointed to by the >> 1 dimensional array is a bit silly. Especially when one considers that >> there >> is code that makes assumptions about every protocol having the same >> internal structures there. Some protocols don't WANT that >> sort of structure. (for example the whole idea of a netmask is foreign >> to appletalk). This needs to be made opaque to the external code. >> >> My suggested first change is to add routing method pointers to the >> 'domain' structure, along with information pointing the data. >> instead of having an array of pointers to uniform structures, >> there would be an array pointing to the 'domain' structures >> for each protocol address domain (protocol family), >> and the methods this reached would be called. The methods would have >> an argument that gives FIB number, but the protocol would be free >> to ignore it. >> >> Interaction with the ARP layer/ LL layer would need to be >> revisited as well. Qing Li has been working on this already. >> >> >> diffs >> for those with p4 access: >> p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 >> //depot/user/julian/routing/src/sys/... >> >> for those with the makediff perl script: >> perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 >> //depot/user/julian/routing/src/sys/... >> >> for those with neither: >> >> http://people.freebsd.org/~julian/mrt2.diff >> >> I just put the userland utility in usr.sbin/setfib/ in p4. >> and changes to netstat in usr.bin/netstat/ >> >> see: >> http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO >> >> >> >> >> I'd like to get comments on this (compat) version, so that I can >> commit it, >> get general testing under way to start the clock for MFC, and then get >> moving on the fuller implementation (that breaks ABIs) and other >> routing issues. >> >> >> Julian >> >> >> >> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >> From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 04:36:47 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EB5216A420; Fri, 28 Dec 2007 04:36:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au [211.29.132.197]) by mx1.freebsd.org (Postfix) with ESMTP id 30BB213C459; Fri, 28 Dec 2007 04:36:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBS4adHL019236 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Dec 2007 15:36:42 +1100 Date: Fri, 28 Dec 2007 15:36:39 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Fullmer In-Reply-To: <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> Message-ID: <20071228143411.C3587@besplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 04:36:47 -0000 On Sat, 22 Dec 2007, Mark Fullmer wrote: > On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: >> >> I still don't understand the original problem, that the kernel is not >> even preemptible enough for network interrupts to work (except in 5.2 >> where Giant breaks things). Perhaps I misread the problem, and it is >> actually that networking works but userland is unable to run in time >> to avoid packet loss. > > The test is done with UDP packets between two servers. The em > driver is incrementing the received packet count correctly but > the packet is not making it up the network stack. If > the application was not servicing the socket fast enough I would > expect to see the "dropped due to full socket buffers" (udps_fullsock) > counter incrementing, as shown by netstat -s. I couldn't see any sign of PREEMPTION not working in 6.3-PREREALEASE. em seemed to keep up with the maximum rate that I can easily generate (640 kpps with tiny udp packets), though it cannot transmit at more than 400 kpps on the same hardware. This is without aby syncer activity to cause glitches. The rest of the system couldn't keep up, and with my normal configuration of net.isr.direct=1, systat -ip (udps_fullsock) showed too many packets being dropped, but all the numbers seemed to add up right. (I didn't do end-to-end packet counts. I'm using ttcp to send and receive packets; the receiver loses so many packets that it rarely terminates properly, and when it does terminate it always shows many dropped.) However, with net.isr.direct=0, packets are dropped with no sign of the problem except a reduced count of good packets in systat -ip. Packet rate counter net.isr.direct=1 net.isr.direct=0 ------------------- ---------------- ---------------- netstat -I 639042 643522 (faster later) systat -ip (total rx) 639042 382567 (dropped many b4 here) (UDP total) 639042 382567 (udps_fullsock) 298911 70340 (diff of prev 2) 340031 312227 (300+k always dropped) net.isr.count small large (seems to be correct 643k) net.isr.directed large (correct?) no change net.isr.queued 0 0 net.isr.drop 0 0 net.isr.direct=0 is apparently causing dropped packets without even counting them. However, the drop seems to be below the netisr level. More worryingly, with full 1500-byte packets (1472 data + 28 UDP header), packets can be sent at a rate of 76 kpps (nearly 950 Mbps) with a load of only 80% on the receiver, yet the ttcp receiver still drops about 1000 pps due top "socket buffer full". With net.usr.direct=0 it drops an additinal 700 pps due to this. Glitches from sync(2) taking 25 ms increase the loss by about 1000 packets, and using rtprio for the ttcp receiver doesn't seem to help at all. In previous mail, you (Mark) wrote: # With FreeBSD 4 I was able to run a UDP data collector with rtprio set, # kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF # in the application. If packets were dropped they would show up # with netstat -s as "dropped due to full socket buffers". # # Since the packet never makes it to ip_input() I no longer have # any way to count drops. There will always be corner cases where # interrupts are lost and drops not accounted for if the adapter # hardware can't report them, but right now I've got no way to # estimate any loss. I tried using SO_RCVBUF in ttcp (it's an old version of ttcp that doesn't have an option for this). With the default kern.ipc.maxsockbuf of 256K, this didn't seem to help. 20MB should work better :-) but I didn't try that. I don't understand how fast the socket buffer fills up and would have thought that 256K was enough for tiny packets but not for 1500-byte packets. Their seems to be a general problem that 1Gbps NICs have or should have rings of size >= 256 or 512 so that they aren't forced to drop packets when their interrupt handler has a reasonable but larger latency, yet if we actually use this feature then we flood the upper layers with hundreds of packets and fill up socket buffers etc. there. Bruce From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 05:23:52 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C3BC16A417; Fri, 28 Dec 2007 05:23:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id A7B7A13C467; Fri, 28 Dec 2007 05:23:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBS5NfSv022528 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Dec 2007 16:23:44 +1100 Date: Fri, 28 Dec 2007 16:23:40 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20071228143411.C3587@besplex.bde.org> Message-ID: <20071228155323.X3858@besplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> <20071228143411.C3587@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , freebsd-stable@FreeBSD.org, freebsd-net@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 05:23:52 -0000 On Fri, 28 Dec 2007, Bruce Evans wrote: > In previous mail, you (Mark) wrote: > > # With FreeBSD 4 I was able to run a UDP data collector with rtprio set, > # kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF > # in the application. If packets were dropped they would show up > # with netstat -s as "dropped due to full socket buffers". > # # Since the packet never makes it to ip_input() I no longer have > # any way to count drops. There will always be corner cases where > # interrupts are lost and drops not accounted for if the adapter > # hardware can't report them, but right now I've got no way to > # estimate any loss. > > I tried using SO_RCVBUF in ttcp (it's an old version of ttcp that doesn't > have an option for this). With the default kern.ipc.maxsockbuf of 256K, > this didn't seem to help. 20MB should work better :-) but I didn't try that. I've now tried this. With kern.ipc.maxsockbuf=20480000 (~20MB) an SO_RCVBUF of 0x1000000 (16MB), the "socket buffer full lossage increases from ~300 kpps (~47%) to ~450 kpps (70%) with tiny packets. I think this is caused by most accesses to the larger buffer being cache misses -- since the system can't keep up, cache misses make it worse). However, with 1500-byte packets, the larger buffer reduces the lossage from 1 kpps in 76 kpps to precisely zero pps, at a cost of only a small percentage of system overhead (~20Idle to ~18%Idle). The above is with net.isr.direct=1. With net.isr.direct=0, the loss is too small to be obvious and is reported as 0, but I don't trust the report. ttcp's packet counts indicate losses of a few per million with direct=0 but none with direct=1. "while :; do sync; sleep 0.1" in the background causes a loss of about 100 pps with direct=0 and a smaller loss with direct=1. Running the ttcp receiver at rtprio 0 doesn't make much difference to the losses. Bruce From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 06:49:23 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4864716A419; Fri, 28 Dec 2007 06:49:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id C4A5F13C44B; Fri, 28 Dec 2007 06:49:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBS6nGth032192 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Dec 2007 17:49:18 +1100 Date: Fri, 28 Dec 2007 17:49:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20071228155323.X3858@besplex.bde.org> Message-ID: <20071228170151.C4166@besplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> <20071228143411.C3587@besplex.bde.org> <20071228155323.X3858@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , freebsd-stable@FreeBSD.org, freebsd-net@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 06:49:23 -0000 On Fri, 28 Dec 2007, Bruce Evans wrote: > On Fri, 28 Dec 2007, Bruce Evans wrote: > >> In previous mail, you (Mark) wrote: >> >> # With FreeBSD 4 I was able to run a UDP data collector with rtprio set, >> # kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF >> # in the application. If packets were dropped they would show up >> # with netstat -s as "dropped due to full socket buffers". >> # # Since the packet never makes it to ip_input() I no longer have >> # any way to count drops. There will always be corner cases where >> # interrupts are lost and drops not accounted for if the adapter >> # hardware can't report them, but right now I've got no way to >> # estimate any loss. I found where drops are recorded for the net.isr.direct=0 case. It is in net.inet.ip.intr_queue.drops. The netisr subsystem just calls IF_HANDOFF(), and IF_HANDOFF() calls _IF_DROP() if the queue fills up. _IF_DROP(ifq) just increments ifq->ip_drops. The usual case for netisrs is for the queue to be ipintrq for NETISR_IP. The following details don't help: - drops for input queues don't seem to be displayed by any utilities (except ones for ipintrq are displayed primitively by sysctl net.inet.ip.intr_queue_drops). netstat and systat only display drops for send queues and ip frags. - the netisr subsystem's drop count doesn't seem to be displayed by any utilities except sysctl. It only counts drops due to there not being a queue; other drops are counted by _IF_DROP() in the per-queue counter. Users have a hard time integrating all these primitively displayed drop counts with other error counters. - the length of ipintrq defaults to the default ifq length of ipqmaxlen = IPQ_MAXLEN = 50. This is inadequate if there is just one NIC in the system that has an rx ring size of >= slightly less than 50. But 1 Gbps NICs should have an rx ring size of 256 or 512 (I think the size is 256 for em; it is 256 for bge due to bogus configuration of hardware that can handle it being 512). If the larger hardware rx ring is actually used, then ipintrq drops are almost ensured in the direct=0 case, so using the larger h/w ring is worse than useless (it also increases cache misses). This is for just one NIC. This problem is often limited by handling rx packets in small bursts, at a cost of extra overhead. Interrupt moderation increases it by increasing burst sizes. This contrasts with the handling of send queues. Send queues are per-interface and most drivers increase the default length from 50 to their ring size (-1 for bogus reasons). I think this is only an optimization, while a similar change for rx queues is important for avoiding packet loss. For send queues, the ifq acts mainly as a primitive implementation of watermarks. I have found that tx queue lengths need to be more like 5000 than 50 or 500 to provide enough buffering when applications are delayed by other applications or just by sleeping until the next clock tick, and use tx queues of length ~20000 (a couple of clock ticks at HZ = 100), but now think queue lengths should be restricted to more like 50 since long queues cannot fit in L2 caches (not to mention they are bad for latency). The length of ipintrq can be changed using sysctl net.inet.ip.intrq_queue_maxlen. Changing it from 50 to 1024 turns most or all ipintrq drops into "socket buffer full" drops (640 kpps input packets and 434 kpps socket buffer fulls with direct=0; 640 kpps input packets and 324 kpps socket buffer fulls with direct=1). Bruce From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 14:56:35 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E3B516A46E; Fri, 28 Dec 2007 14:56:35 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from outbound0.mx.meer.net (outbound0.mx.meer.net [209.157.153.23]) by mx1.freebsd.org (Postfix) with ESMTP id 21F6E13C4DD; Fri, 28 Dec 2007 14:56:34 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.sv.meer.net (8.12.10/8.12.6) with ESMTP id lBSDnHih047757; Fri, 28 Dec 2007 05:49:17 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (61.204.211.246.customerlink.pwd.ne.jp [61.204.211.246]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id lBSDnGBQ048390; Fri, 28 Dec 2007 05:49:16 -0800 (PST) (envelope-from gnn@neville-neil.com) Date: Fri, 28 Dec 2007 22:49:15 +0900 Message-ID: From: gnn@freebsd.org To: Julian Elischer In-Reply-To: <4772F123.5030303@elischer.org> References: <4772F123.5030303@elischer.org> User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.1.50 (i386-apple-darwin8.10.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: FreeBSD Net , Robert Watson , Qing Li , arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 14:56:35 -0000 At Wed, 26 Dec 2007 16:26:11 -0800, julian wrote: > > Resending as my mailer made a dog's breakfast of the first one > with all sorts of wierd line breaks... hopefully this will be better. > (I haven't sent it yet so I'm hoping).. > > > ------------------------------------------- > > > > On thing where FreeBSD has been falling behind, and which by chance > I have some time to work on is "policy based routing", which allows > different packet streams to be routed by more than just the > destination address. > > Constraints: > ------------ > > I want to make some form of this available in the 6.x tree > (and by extension 7.x) , but FreeBSD in general needs it so I might as > well > do it in -current and back port the portions I need. > > One of the ways that this can be done is to have the ability to > instantiate multiple kernel routing tables (which I will now > refer to as "Forwarding Information Bases" or "FIBs" for political > correctness reasons. Which FIB a particular packet uses to make > the next hop decision can be decided by a number of mechanisms. > The policies these mechanisms implement are the "Policies" referred > to in "Policy based routing". > > One of the constraints I have if I try to back port this work to > 6.x is that it must be implemented as a EXTENSION to the existing > ABIs in 6.x so that third party applications do not need to be > recompiled in timespan of the branch. > > Implementation method, (part 1) > ------------------------------- > For this reason I have implemented a "sufficient subset" of a > multiple routing table solution in Perforce, and back-ported it > to 6.x. (also in Perforce though not yet caught up with what I > have done in -current/P4). The subset allows a number of FIBs > to be defined at compile time (sufficient for my purposes in 6.x) and > implements the changes needed to allow IPV4 to use them. I have not done > the changes for ipv6 simply because I do not need it, and I do not > have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. > > Other protocol families are left untouched and should there be > users with proprietary protocol families, they should continue to work > and be oblivious to the existence of the extra FIBs. > > To understand how this is done, one must know that the current FIB > code starts everything off with a single dimensional array of > pointers to FIB head structures (One per protocol family), each of > which in turn points to the trie of routes available to that family. > > The basic change in the ABI compatible version of the change is to > extent that array to be a 2 dimensional array, so that > instead of protocol family X looking at rt_tables[X] for the > table it needs, it looks at rt_tables[Y][X] when for all > protocol families except ipv4 Y is always 0. > Code that is unaware of the change always just sees the first row > of the table, which of course looks just like the one dimensional > array that existed before. > > > The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() > are all maintained, but refer only to the first row of the array, > so that existing callers in proprietary protocols can continue to > do the "right thing". > Some new entry points are added, for the exclusive use of ipv4 code > called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), > which have an extra argument which refers the code to the correct row. > > In addition, there are some new entry points (currently called > dom_rtalloc() and friends) that check the Address family being > looked up and call either rtalloc() (and friends) if the protocol > is not IPv4 forcing the action to row 0 or to the appropriate row > if it IS IPv4 (and that info is available). These are for calling > from code that is not specific to any particular protocol. The way > these are implemented would change in the non ABI preserving code > to be added later. > > One feature of the first version of the code is that for ipv4, > the interface routes show up automatically on all the FIBs, so > that no matter what FIB you select you always have the basic > direct attached hosts available to you. (rtinit() does this > automatically). > You CAN delete an interface route from one FIB should you want > to but by default it's there. ARP information is also available > in each FIB. It's assumed that the same machine would have the > same MAC address, regardless of which FIB you are using to get > to it. > > > This brings us as to how the correct FIB is selected for an outgoing > IPV4 packet. > > Packets fall into one of a number of classes. > 1/ locally generated packets, coming from a socket/PCB. > Such packets select a FIB from a number associated with the > socket/PCB. This in turn is inherited from the process, > but can be changed by a socket option. The process in turn > inherits it on fork. I have written a utility call setfib > that acts a bit like nice.. > > setfib -n 3 ping target.example.com # will use fib 3 for ping. > > 2/ packets received on an interface for forwarding. > By default these packets would use table 0, > (or possibly a number settable in a sysctl(not yet)). > but prior to routing the firewall can inspect them (see below). > > 3/ packets inspected by a packet classifier, which can arbitrarily > associate a fib with it on a packet by packet basis. > A fib assigned to a packet by a packet classifier > (such as ipfw) would over-ride a fib associated by > a more default source. (such as cases 1 or 2). > > Routing messages would be associated with their > process, and thus select one FIB or another. > > In addition Netstat has been edited to be able to cope with the > fact that the array is now 2 dimensional. (It looks in system > memory using libkvm (!)). > > In addition two sysctls are added to give: > a) the number of FIBs compiled in (active) > b) the default FIB of the calling process. > > Early testing experience: > ------------------------- > > Basically our (IronPort's) appliance does this functionality already > using ipfw fwd but that method has some drawbacks. > > For example, > It can't fully simulate a routing table because it can't influence the > socket's choice of local address when a connect() is done. > > > Testing during the generating of these changes has been > remarkably smooth so far. Multiple tables have co-existed > with no notable side effects, and packets have been routes > accordingly. > > I have not yet added the changes to ipfw. > pf has some similar changes already but they seem to rely on > the various FIBs having symbolic names. Which I do not plan to support > in the first version of these changes. > > SCTP has interestingly enough built in support for this, called VRFs > in Cisco parlance. it will be interesting to see how that handles it > when it suddenly actually does something. > > I have not redone my testing since my last edits, but will be > retesting with the current code asap. > > > Where to next: > -------------------- > > After committing the ABI compatible version and MFCing it, I'd > like to proceed in a forward direction in -current. this will > result in some roto-tilling in the routing code. > > Firstly: the current code's idea of having a separate tree per > protocol family, all of the same format, and pointed to by the > 1 dimensional array is a bit silly. Especially when one considers that > there > is code that makes assumptions about every protocol having the same > internal structures there. Some protocols don't WANT that > sort of structure. (for example the whole idea of a netmask is foreign > to appletalk). This needs to be made opaque to the external code. > > My suggested first change is to add routing method pointers to the > 'domain' structure, along with information pointing the data. > instead of having an array of pointers to uniform structures, > there would be an array pointing to the 'domain' structures > for each protocol address domain (protocol family), > and the methods this reached would be called. The methods would have > an argument that gives FIB number, but the protocol would be free > to ignore it. > > Interaction with the ARP layer/ LL layer would need to be > revisited as well. Qing Li has been working on this already. > > > diffs > for those with p4 access: > p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 > //depot/user/julian/routing/src/sys/... > > for those with the makediff perl script: > perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 > //depot/user/julian/routing/src/sys/... > > for those with neither: > > http://people.freebsd.org/~julian/mrt2.diff > > I just put the userland utility in usr.sbin/setfib/ in p4. > and changes to netstat in usr.bin/netstat/ > > see: > http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO > > > > > I'd like to get comments on this (compat) version, so that I can > commit it, get general testing under way to start the clock for MFC, > and then get moving on the fuller implementation (that breaks ABIs) > and other routing issues. > How does this work with Marko Zec's virtual stack system? Best, George From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 15:15:05 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 868E716A469 for ; Fri, 28 Dec 2007 15:15:05 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: from hs-out-2122.google.com (hs-out-0708.google.com [64.233.178.243]) by mx1.freebsd.org (Postfix) with ESMTP id B7FB913C448 for ; Fri, 28 Dec 2007 15:15:04 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: by hs-out-2122.google.com with SMTP id j58so2833643hsj.11 for ; Fri, 28 Dec 2007 07:15:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=Xgce59oOh55fxeTG2J6YoIG8nu/AgUUjXBJ/AWMokiQ=; b=fzgYaAoedMzuUszJkE+GNfe/ea4DqL7/0IEoBi5AP0yY3yaII2mMzfLKQU5nYMg36cPsoN4mmbrrPfFAL24SH2o0PfDd4ZRdjLR9Ro1YHMlLm4ooQQKe8sBU9shqfzVISxuZ7QkSZYosPB5cWncR+GKj1iONsVSiFTLHjvKB7Zw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=h+hJojK1RXsDr/WZwnFxT9GJjnvQLycMaqF82K6zCN/ip4cM35z2JSdhkaVJnnZe1cB+7Xs5Ea5MwZVvyiB1RWwedWe8TTQbY8+wRUCNK9hqVY+i0poM+MKuM5KszSeKGeBgDOaUow4eVcuAiCPfiepyUaNhmAnFp+5+1jB5mu0= Received: by 10.150.197.8 with SMTP id u8mr2605151ybf.131.1198854903741; Fri, 28 Dec 2007 07:15:03 -0800 (PST) Received: by 10.150.219.5 with HTTP; Fri, 28 Dec 2007 07:15:03 -0800 (PST) Message-ID: Date: Fri, 28 Dec 2007 17:15:03 +0200 From: "Ivo Vachkov" To: "Julian Elischer" In-Reply-To: <477416CC.4090906@elischer.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4772F123.5030303@elischer.org> <477416CC.4090906@elischer.org> Cc: FreeBSD Net , Robert Watson , Qing Li , arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 15:15:05 -0000 On Dec 27, 2007 11:19 PM, Julian Elischer wrote: > > Ivo Vachkov wrote: > > On Dec 27, 2007 2:26 AM, Julian Elischer wrote: > >> Resending as my mailer made a dog's breakfast of the first one > >> with all sorts of wierd line breaks... hopefully this will be better. > >> (I haven't sent it yet so I'm hoping).. > >> > >> > >> ------------------------------------------- > >> > >> > >> > >> On thing where FreeBSD has been falling behind, and which by chance I > >> have some time to work on is "policy based routing", which allows > >> different > >> packet streams to be routed by more than just the destination address. > >> > >> Constraints: > >> ------------ > >> > >> I want to make some form of this available in the 6.x tree > >> (and by extension 7.x) , but FreeBSD in general needs it so I might as > >> well > >> do it in -current and back port the portions I need. > >> > >> One of the ways that this can be done is to have the ability to > >> instantiate multiple kernel routing tables (which I will now > >> refer to as "Forwarding Information Bases" or "FIBs" for political > >> correctness reasons. Which FIB a particular packet uses to make > >> the next hop decision can be decided by a number of mechanisms. > >> The policies these mechanisms implement are the "Policies" referred > >> to in "Policy based routing". > >> > >> One of the constraints I have if I try to back port this work to > >> 6.x is that it must be implemented as a EXTENSION to the existing > >> ABIs in 6.x so that third party applications do not need to be > >> recompiled in timespan of the branch. > >> > >> Implementation method, (part 1) > >> ------------------------------- > >> For this reason I have implemented a "sufficient subset" of a > >> multiple routing table solution in Perforce, and back-ported it > >> to 6.x. (also in Perforce though not yet caught up with what I > >> have done in -current/P4). The subset allows a number of FIBs > >> to be defined at compile time (sufficient for my purposes in 6.x) and > >> implements the changes needed to allow IPV4 to use them. I have not done > >> the changes for ipv6 simply because I do not need it, and I do not > >> have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. > > By the way, I might add that in the 6.x compat. version I may end up > limiting the feature to 8 tables. This is because I need to store some > stuff in an efficient way in the mbuf, and in a compatible manner this > is easiest done by stealing the top 4 bits in the mbuf dlags word > and defining them as: > > #define M_HAVEFIB 0x10000000 > #define M_FIBMASK 0x07 > #define M_FIBNUM 0xe0000000 > #define M_FIBSHIFT 29 > #define m_getfib(_m, _default) ((m->m_flags & M_HAVE_FIBNUM) ? > ((m->m_flags >> M_FIBSHIFT) & M_FIBMASK) : _default) > #M_SETFIB(_m, _fib) do { \ > _m->m_flags &= ~M_FIBNUM; \ > _m->m_flags |= (M_HAVEFIB|((_fib & M_FIBMASK) << M_FIBSHIFT));\ > } while (0) > > This then becomes very easy to change to use a tag or > whatever is needed in later versions , and the number can > be expanded past 8 predefined FIBs at that time.. > > >> > >> Other protocol families are left untouched and should there be > >> users with proprietary protocol families, they should continue to work > >> and be oblivious to the existence of the extra FIBs. > >> > >> To understand how this is done, one must know that the current FIB > >> code starts everything off with a single dimensional array of > >> pointers to FIB head structures (One per protocol family), each of > >> which in turn points to the trie of routes available to that family. > >> > >> The basic change in the ABI compatible version of the change is to > >> extent that array to be a 2 dimensional array, so that > >> instead of protocol family X looking at rt_tables[X] for the > >> table it needs, it looks at rt_tables[Y][X] when for all > >> protocol families except ipv4 Y is always 0. > >> Code that is unaware of the change always just sees the first row > >> of the table, which of course looks just like the one dimensional > >> array that existed before. > > > > Pretty much like the OpenBSD approach :) > > well, I did look at the code briefly, but I didn't base it on it.. > > > > > >> The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() > >> are all maintained, but refer only to the first row of the array, > >> so that existing callers in proprietary protocols can continue to > >> do the "right thing". > >> Some new entry points are added, for the exclusive use of ipv4 code > >> called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), > >> which have an extra argument which refers the code to the correct row. > >> > >> In addition, there are some new entry points (currently called > >> dom_rtalloc() and friends) that check the Address family being > >> looked up and call either rtalloc() (and friends) if the protocol > >> is not IPv4 forcing the action to row 0 or to the appropriate row > >> if it IS IPv4 (and that info is available). These are for calling > >> from code that is not specific to any particular protocol. The way > >> these are implemented would change in the non ABI preserving code > >> to be added later. > >> > >> One feature of the first version of the code is that for ipv4, > >> the interface routes show up automatically on all the FIBs, so > >> that no matter what FIB you select you always have the basic > >> direct attached hosts available to you. (rtinit() does this > >> automatically). > >> You CAN delete an interface route from one FIB should you want > >> to but by default it's there. ARP information is also available > >> in each FIB. It's assumed that the same machine would have the > >> same MAC address, regardless of which FIB you are using to get > >> to it. > >> > >> > >> This brings us as to how the correct FIB is selected for an outgoing > >> IPV4 packet. > >> > >> Packets fall into one of a number of classes. > >> 1/ locally generated packets, coming from a socket/PCB. > >> Such packets select a FIB from a number associated with the > >> socket/PCB. This in turn is inherited from the process, > >> but can be changed by a socket option. The process in turn > >> inherits it on fork. I have written a utility call setfib > >> that acts a bit like nice.. > >> > >> setfib -n 3 ping target.example.com # will use fib 3 for ping. > >> > >> 2/ packets received on an interface for forwarding. > >> By default these packets would use table 0, > >> (or possibly a number settable in a sysctl(not yet)). > >> but prior to routing the firewall can inspect them (see below). > >> > >> 3/ packets inspected by a packet classifier, which can arbitrarily > >> associate a fib with it on a packet by packet basis. > >> A fib assigned to a packet by a packet classifier > >> (such as ipfw) would over-ride a fib associated by > >> a more default source. (such as cases 1 or 2). > > > > For the 2/ and 3/ cases I added (in a personal work i've been doing > > lately) additional field in struct mbuf which can be set by a packet > > filter or other application upon receiving which points the right > > table to use for the lookup. This way a simple "marking" can be used > > to divide different flows and create policy based routing. > > This would be the final way but I want to really minimise problems > in the compat versions, so I'll avoid doing that for now. > > Do you have this work available? I have it. However, I'll break a NDA if I 'open' it. > And have you looked at mi diffs below? I plan to look at your code asap. > > > > >> Routing messages would be associated with their > >> process, and thus select one FIB or another. > >> > >> In addition Netstat has been edited to be able to cope with the > >> fact that the array is now 2 dimensional. (It looks in system > >> memory using libkvm (!)). > >> > >> In addition two sysctls are added to give: > >> a) the number of FIBs compiled in (active) > >> b) the default FIB of the calling process. > >> > >> Early testing experience: > >> ------------------------- > >> > >> Basically our (IronPort's) appliance does this functionality already > >> using ipfw fwd but that method has some drawbacks. > >> > >> For example, > >> It can't fully simulate a routing table because it can't influence the > >> socket's choice of local address when a connect() is done. > >> > >> > >> Testing during the generating of these changes has been > >> remarkably smooth so far. Multiple tables have co-existed > >> with no notable side effects, and packets have been routes > >> accordingly. > >> > >> I have not yet added the changes to ipfw. > >> pf has some similar changes already but they seem to rely on > >> the various FIBs having symbolic names. Which I do not plan to support > >> in the first version of these changes. > >> > >> SCTP has interestingly enough built in support for this, called VRFs > >> in Cisco parlance. it will be interesting to see how that handles it > >> when it suddenly actually does something. > >> > >> I have not redone my testing since my last edits, but will be > >> retesting with the current code asap. > >> > >> > >> Where to next: > >> -------------------- > >> > >> After committing the ABI compatible version and MFCing it, I'd > >> like to proceed in a forward direction in -current. this will > >> result in some roto-tilling in the routing code. > >> > >> Firstly: the current code's idea of having a separate tree per > >> protocol family, all of the same format, and pointed to by the > >> 1 dimensional array is a bit silly. Especially when one considers that > >> there > >> is code that makes assumptions about every protocol having the same > >> internal structures there. Some protocols don't WANT that > >> sort of structure. (for example the whole idea of a netmask is foreign > >> to appletalk). This needs to be made opaque to the external code. > >> > >> My suggested first change is to add routing method pointers to the > >> 'domain' structure, along with information pointing the data. > >> instead of having an array of pointers to uniform structures, > >> there would be an array pointing to the 'domain' structures > >> for each protocol address domain (protocol family), > >> and the methods this reached would be called. The methods would have > >> an argument that gives FIB number, but the protocol would be free > >> to ignore it. > >> > >> Interaction with the ARP layer/ LL layer would need to be > >> revisited as well. Qing Li has been working on this already. > >> > >> > >> diffs > >> for those with p4 access: > >> p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 > >> //depot/user/julian/routing/src/sys/... > >> > >> for those with the makediff perl script: > >> perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 > >> //depot/user/julian/routing/src/sys/... > >> > >> for those with neither: > >> > >> http://people.freebsd.org/~julian/mrt2.diff > >> > >> I just put the userland utility in usr.sbin/setfib/ in p4. > >> and changes to netstat in usr.bin/netstat/ > >> > >> see: > >> http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/julian/routing/src&HIDEDEL=NO > >> > >> > >> > >> > >> I'd like to get comments on this (compat) version, so that I can > >> commit it, > >> get general testing under way to start the clock for MFC, and then get > >> moving on the fuller implementation (that breaks ABIs) and other > >> routing issues. > >> > >> > >> Julian > >> > >> > >> > >> > >> _______________________________________________ > >> freebsd-arch@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > >> > > -- "UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity." Dennis Ritchie From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 17:17:04 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B84016A421 for ; Fri, 28 Dec 2007 17:17:04 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outE.internet-mail-service.net (outE.internet-mail-service.net [216.240.47.228]) by mx1.freebsd.org (Postfix) with ESMTP id 2B4A313C459 for ; Fri, 28 Dec 2007 17:17:04 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Fri, 28 Dec 2007 09:17:03 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 01D3E126DA3; Fri, 28 Dec 2007 09:17:02 -0800 (PST) Message-ID: <47752F98.6050209@elischer.org> Date: Fri, 28 Dec 2007 09:17:12 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: gnn@freebsd.org References: <4772F123.5030303@elischer.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Robert Watson , Qing Li , arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 17:17:04 -0000 gnn@freebsd.org wrote: > At Wed, 26 Dec 2007 16:26:11 -0800, > julian wrote: [...] > > How does this work with Marko Zec's virtual stack system? > > Best, > George orthogonal From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 20:01:50 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AAD2516A469; Fri, 28 Dec 2007 20:01:50 +0000 (UTC) (envelope-from zec@tel.fer.hr) Received: from xaqua.tel.fer.hr (xaqua.tel.fer.hr [161.53.19.25]) by mx1.freebsd.org (Postfix) with ESMTP id E7FEF13C45D; Fri, 28 Dec 2007 20:01:49 +0000 (UTC) (envelope-from zec@tel.fer.hr) Received: by xaqua.tel.fer.hr (Postfix, from userid 20006) id 9E5D99B742; Fri, 28 Dec 2007 20:42:43 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on xaqua.tel.fer.hr X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.7 Received: from [192.168.200.112] (zec2.tel.fer.hr [161.53.19.79]) by xaqua.tel.fer.hr (Postfix) with ESMTP id ABBC89B6C9; Fri, 28 Dec 2007 20:42:40 +0100 (CET) From: Marko Zec To: freebsd-arch@freebsd.org, FreeBSD Net Date: Fri, 28 Dec 2007 20:40:30 +0100 User-Agent: KMail/1.9.7 References: <4772F123.5030303@elischer.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200712282040.30745.zec@tel.fer.hr> Cc: gnn@freebsd.org, Robert Watson , Julian Elischer , Qing Li Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 20:01:50 -0000 On Friday 28 December 2007 14:49:15 gnn@freebsd.org wrote: > At Wed, 26 Dec 2007 16:26:11 -0800, > > julian wrote: > > > > On thing where FreeBSD has been falling behind, and which by chance > > I have some time to work on is "policy based routing", which allows > > different packet streams to be routed by more than just the > > destination address. > > > > Constraints: > > ------------ > > > > I want to make some form of this available in the 6.x tree > > (and by extension 7.x) , but FreeBSD in general needs it so I might > > as well > > do it in -current and back port the portions I need. > > > > One of the ways that this can be done is to have the ability to > > instantiate multiple kernel routing tables (which I will now > > refer to as "Forwarding Information Bases" or "FIBs" for political > > correctness reasons. Which FIB a particular packet uses to make > > the next hop decision can be decided by a number of mechanisms. > > The policies these mechanisms implement are the "Policies" referred > > to in "Policy based routing". > > > > One of the constraints I have if I try to back port this work to > > 6.x is that it must be implemented as a EXTENSION to the existing > > ABIs in 6.x so that third party applications do not need to be > > recompiled in timespan of the branch. > > > > Implementation method, (part 1) > > ------------------------------- > > For this reason I have implemented a "sufficient subset" of a > > multiple routing table solution in Perforce, and back-ported it > > to 6.x. (also in Perforce though not yet caught up with what I > > have done in -current/P4). The subset allows a number of FIBs > > to be defined at compile time (sufficient for my purposes in 6.x) > > and implements the changes needed to allow IPV4 to use them. I have > > not done the changes for ipv6 simply because I do not need it, and > > I do not have enough knowledge of ipv6 (e.g. neighbor discovery) > > needed to do it. > > > > Other protocol families are left untouched and should there be > > users with proprietary protocol families, they should continue to > > work and be oblivious to the existence of the extra FIBs. > > > > To understand how this is done, one must know that the current FIB > > code starts everything off with a single dimensional array of > > pointers to FIB head structures (One per protocol family), each of > > which in turn points to the trie of routes available to that > > family. > > > > The basic change in the ABI compatible version of the change is to > > extent that array to be a 2 dimensional array, so that > > instead of protocol family X looking at rt_tables[X] for the > > table it needs, it looks at rt_tables[Y][X] when for all > > protocol families except ipv4 Y is always 0. > > Code that is unaware of the change always just sees the first row > > of the table, which of course looks just like the one dimensional > > array that existed before. > > > > > > The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() > > are all maintained, but refer only to the first row of the array, > > so that existing callers in proprietary protocols can continue to > > do the "right thing". > > Some new entry points are added, for the exclusive use of ipv4 code > > called in_rtrequest(), in_rtalloc(), in_rtalloc1() and > > in_rtalloc_ign(), which have an extra argument which refers the > > code to the correct row. > > > > In addition, there are some new entry points (currently called > > dom_rtalloc() and friends) that check the Address family being > > looked up and call either rtalloc() (and friends) if the protocol > > is not IPv4 forcing the action to row 0 or to the appropriate row > > if it IS IPv4 (and that info is available). These are for calling > > from code that is not specific to any particular protocol. The way > > these are implemented would change in the non ABI preserving code > > to be added later. > > > > One feature of the first version of the code is that for ipv4, > > the interface routes show up automatically on all the FIBs, so > > that no matter what FIB you select you always have the basic > > direct attached hosts available to you. (rtinit() does this > > automatically). > > You CAN delete an interface route from one FIB should you want > > to but by default it's there. ARP information is also available > > in each FIB. It's assumed that the same machine would have the > > same MAC address, regardless of which FIB you are using to get > > to it. > > > > > > This brings us as to how the correct FIB is selected for an > > outgoing IPV4 packet. > > > > Packets fall into one of a number of classes. > > 1/ locally generated packets, coming from a socket/PCB. > > Such packets select a FIB from a number associated with the > > socket/PCB. This in turn is inherited from the process, > > but can be changed by a socket option. The process in turn > > inherits it on fork. I have written a utility call setfib > > that acts a bit like nice.. > > > > setfib -n 3 ping target.example.com # will use fib 3 for > > ping. > > > > 2/ packets received on an interface for forwarding. > > By default these packets would use table 0, > > (or possibly a number settable in a sysctl(not yet)). > > but prior to routing the firewall can inspect them (see below). > > > > 3/ packets inspected by a packet classifier, which can arbitrarily > > associate a fib with it on a packet by packet basis. > > A fib assigned to a packet by a packet classifier > > (such as ipfw) would over-ride a fib associated by > > a more default source. (such as cases 1 or 2). > > > > Routing messages would be associated with their > > process, and thus select one FIB or another. > > > > In addition Netstat has been edited to be able to cope with the > > fact that the array is now 2 dimensional. (It looks in system > > memory using libkvm (!)). > > > > In addition two sysctls are added to give: > > a) the number of FIBs compiled in (active) > > b) the default FIB of the calling process. > > > > Early testing experience: > > ------------------------- > > > > Basically our (IronPort's) appliance does this functionality > > already using ipfw fwd but that method has some drawbacks. > > > > For example, > > It can't fully simulate a routing table because it can't influence > > the socket's choice of local address when a connect() is done. > > > > > > Testing during the generating of these changes has been > > remarkably smooth so far. Multiple tables have co-existed > > with no notable side effects, and packets have been routes > > accordingly. > > > > I have not yet added the changes to ipfw. > > pf has some similar changes already but they seem to rely on > > the various FIBs having symbolic names. Which I do not plan to > > support in the first version of these changes. > > > > SCTP has interestingly enough built in support for this, called > > VRFs in Cisco parlance. it will be interesting to see how that > > handles it when it suddenly actually does something. > > > > I have not redone my testing since my last edits, but will be > > retesting with the current code asap. > > > > > > Where to next: > > -------------------- > > > > After committing the ABI compatible version and MFCing it, I'd > > like to proceed in a forward direction in -current. this will > > result in some roto-tilling in the routing code. > > > > Firstly: the current code's idea of having a separate tree per > > protocol family, all of the same format, and pointed to by the > > 1 dimensional array is a bit silly. Especially when one considers > > that there > > is code that makes assumptions about every protocol having the same > > internal structures there. Some protocols don't WANT that > > sort of structure. (for example the whole idea of a netmask is > > foreign to appletalk). This needs to be made opaque to the external > > code. > > > > My suggested first change is to add routing method pointers to the > > 'domain' structure, along with information pointing the data. > > instead of having an array of pointers to uniform structures, > > there would be an array pointing to the 'domain' structures > > for each protocol address domain (protocol family), > > and the methods this reached would be called. The methods would > > have an argument that gives FIB number, but the protocol would be > > free to ignore it. > > > > Interaction with the ARP layer/ LL layer would need to be > > revisited as well. Qing Li has been working on this already. > > > > > > diffs > > for those with p4 access: > > p4 diff2 -du //depot/vendor/freebsd/src/sys/...@131121 > > //depot/user/julian/routing/src/sys/... > > > > for those with the makediff perl script: > > perl ~/makediff.pl //depot/vendor/freebsd/src/sys/...@131121 > > //depot/user/julian/routing/src/sys/... > > > > for those with neither: > > > > http://people.freebsd.org/~julian/mrt2.diff > > > > I just put the userland utility in usr.sbin/setfib/ in p4. > > and changes to netstat in usr.bin/netstat/ > > > > see: > > http://perforce.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/user/ > >julian/routing/src&HIDEDEL=NO > > > > I'd like to get comments on this (compat) version, so that I can > > commit it, get general testing under way to start the clock for > > MFC, and then get moving on the fuller implementation (that breaks > > ABIs) and other routing issues. > > How does this work with Marko Zec's virtual stack system? The thrust behind Julian's work seems to be providing multiple forwarding tables for for purposes of traffic engineering / policy based routing, with a single firewall instance used as a classifier. vimage-style network stack virtualization provides for more strict isolation on both port and IP address space, independent firewall instances, IPSEC config / state etc., and as such might be better suited for providing enhanced jail-style virtual hosting environments, as well as for providing virtual router "slices". So once we get Julian's multi-FIB stuff in the base system, I see no reason why we couldn't have this functionality replicated in each "vimage" instance, i.e. have multiple independent virtual networking environnments, each with multiple FIBs. Implementationwise, my hacks currently rely on macros for conditional virtualization of global variables / structs. As long as Julian's changes continue to be unconditional, i.e. without playing a similar macroization game, I think integrating this code (once it hits HEAD) into p4/projects/vimage should be more or less a straightforward job. Marko From owner-freebsd-net@FreeBSD.ORG Fri Dec 28 22:09:29 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB72616A468; Fri, 28 Dec 2007 22:09:29 +0000 (UTC) (envelope-from peo@intersonic.se) Received: from neonpark.inter-sonic.com (neonpark.inter-sonic.com [212.247.8.98]) by mx1.freebsd.org (Postfix) with ESMTP id 8B5FA13C46E; Fri, 28 Dec 2007 22:09:29 +0000 (UTC) (envelope-from peo@intersonic.se) X-Virus-Scanned: amavisd-new at inter-sonic.com Message-ID: <47757413.6010807@intersonic.se> Date: Fri, 28 Dec 2007 23:09:23 +0100 From: Per olof Ljungmark Organization: Intersonic AB User-Agent: Thunderbird 2.0.0.9 (X11/20071216) MIME-Version: 1.0 To: Stefan Lambrev References: <4774E2FB.2090107@intersonic.se> <4774E68E.7030200@moneybookers.com> <47750046.80705@intersonic.se> <4775229A.3040707@moneybookers.com> In-Reply-To: <4775229A.3040707@moneybookers.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , freebsd-net@FreeBSD.org, freebsd-current Subject: Re: [Fwd: Re: rtfree: 0xc5caad98 has 2 refs] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 22:09:29 -0000 Stefan Lambrev wrote: > Hi, > > Can you replace all calls to rtfree() with RTFREE_LOCKED() in those files: > > netinet/if_ether.c > netinet6/nd6_nbr.c > netinet6/in6_ifattach.c > netinet6/in6_gif.c > > Of course do not forget net/route.c with the patch from the PR. > Recompile the kernel and check if this will cure your hangs? > > I'm not sure about the lock order reversal, may be it was introduced > with kbd_backtrace(). > You can remove it from route.c, replace rtfree() and build kernel with > debug, to see if the LOR is gone. > > It seems that the panic is caused by rtalloc1() called in route.c line > 333 : > rt = rtalloc1(dst, 0, 0UL); /* NB: rt is locked */ > > most probably because rt is not locked :) > I'm out of ideas how to check if it is really locked, but you can > experiment with RT_LOCK() and RT_UNLOCK(). > May be mtx_trylock() can help too. > > Please share your findings with -net & -current if you did not before. > > =cut= Unfortunately I ran out of time before I could complete the test. However, I can report one more interesting finding from today: The icmp packets that triggers the bug probably comes either from a Cisco router or the setup itself. Late today our network topology was changed, Previous setup: affected hosts ISP's router (default gw) .1 LAN ------------ router-------- wlan 1 (via ISP) | 192.168.3.0 our firewall .254 | fw ----------wlan 2 | 172.16.2.0 (isakmpd) | Internet Current setup: affected hosts our fw (OpenBSD) .1 192.168.3.0 LAN ------------ router------ wlan 1 (isakmpd) | | 172.16.2.0 (isakmpd) | --------wlan 2 | | Internet and this "fixed" the problem! We have no access to the Cisco so I don't know it's configuration. But: No lockups, no "rtfree" messages. If the bug is still unresolved mid-January I can continue testing by then. Thanks to all for your suggestions and help! --per From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 00:17:30 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9931416A473 for ; Sat, 29 Dec 2007 00:17:30 +0000 (UTC) (envelope-from tiffany.snyder@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.178]) by mx1.freebsd.org (Postfix) with ESMTP id 7224513C459 for ; Sat, 29 Dec 2007 00:17:30 +0000 (UTC) (envelope-from tiffany.snyder@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so6144557waf.3 for ; Fri, 28 Dec 2007 16:17:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=CdVoIH36qEnbPidpKj43UeADBK2JZVYfG27riZz/+vw=; b=HPLvEktvVnEfDnqt+vTLKwICVoKFD7XN5Hmbq/eWCSZQ8no/2w4dk/9o4cxjhmNSBa+rvriq8rKU7kw+pmd3JR4MdUzP/NdDQQMeMJ2+KcQ5RTaUIOo5XPwIbCqUVpouhN2zpUJDqZvJ9dK7DZNcuz0F0WpETFnHRuLLMkqOQdw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=Gg8j8gO3jmt96C7RTWxHWaGxTlI/rcQJgVyxQ/+0+aBrBKatLcEF0jAf2Ue2PLslKgs9FqHOfYscNfNkcUupFZviENaa2km0PIseuUGehC3WJXHa2YctRt8ljgFc/XDG13ULQo/DwaOnxbWAXizRG9jVLxiZtI/3S4da9US70Tg= Received: by 10.142.114.15 with SMTP id m15mr3262604wfc.235.1198885891831; Fri, 28 Dec 2007 15:51:31 -0800 (PST) Received: by 10.142.44.7 with HTTP; Fri, 28 Dec 2007 15:51:31 -0800 (PST) Message-ID: Date: Fri, 28 Dec 2007 15:51:31 -0800 From: "Tiffany Snyder" To: "Andre Oppermann" In-Reply-To: <43B47CB5.3C0F1632@freebsd.org> MIME-Version: 1.0 References: <43B45EEF.6060800@x-trader.de> <43B47CB5.3C0F1632@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: Routing SMP benefit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 00:17:30 -0000 Hi Andre, are those numbers for small (64 bytes) packets? Good job on pushing the base numbers higher on the same HW. What piqued my attention was the note that our forwarding performance doesn't scale with multiple CPUs. Which means there's a lot of work to be done :-) Have we taken a look at OpenSolaris' Surya (http://www.opensolaris.org/os/community/networking/surya-design.pdf) project? They allow multiple readers/single writer on the radix_node_head (and not a mutex as we do) and we may be able to do the same to gain some parallelism. There are other things in Surya that exploit multiple CPUs. It's definitely worth a read. DragonFlyBSD seems to achieve parallelism by classifying packet as flows and then redirecting the flows to different CPUs. OpenSolaris also does something similar. We can definitely think along those lines. NOTE: 1) I said multiple instead of dual CPUs on purpose. 2) I mentioned OpenSolaris and DragonFlyBSD as examples and to acknowledge the work they are doing and to show that FreeBSD is far behind and is losing it's lustre on continuing to be the networking platform of choice. Thanks, Tiffany. On 12/29/05, Andre Oppermann wrote: > Markus Oestreicher wrote: > > > > Currently running a few routers on 5-STABLE I have read the > > recent changes in the network stack with interest. > > You should run 6.0R. It contains many improvements over 5-STABLE. > > > A few questions come to my mind: > > > > - Can a machine that mainly routes packets between two em(4) > > interfaces benefit from a second CPU and SMP kernel? Can both > > CPUs process packets from the same interface in parallel? > > My testing has shown that a machine can benefit from it but not > much in the forwarding performance. The main benefit is the > prevention of lifelock if you have very high packet loads. The > second CPU on SMP keeps on doing all userland tasks and running > routing protocols. Otherwise your BGP sessions or OSPF hellos > would stop and remove you from the routing cloud. > > > - From reading the lists it appears that net.isr.direct > > and net.ip.fastforwarding are doing similar things. Should > > they be used together or rather not? > > net.inet.ip.fastforwarding has precedence over net.isr.direct and > enabling both at the same doesn't gain you anything. Fastforwarding > is about 30% faster than all other methods available, including > polling. On my test machine with two em(4) and an AMD Opteron 852 > (2.6GHz) I can route 580'000 pps with zero packet loss on -CURRENT. > An upcoming optimization that will go into -CURRENT in the next > few days pushes that to 714'000 pps. Futher optimizations are > underway to make a stock kernel do close to or above 1'000'000 pps > on the same hardware. > > -- > Andre > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to " freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 02:10:09 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B44116A419 for ; Sat, 29 Dec 2007 02:10:09 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [62.111.66.27]) by mx1.freebsd.org (Postfix) with ESMTP id 3EFE913C43E for ; Sat, 29 Dec 2007 02:10:09 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.str.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id A645841C541 for ; Sat, 29 Dec 2007 03:10:07 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([62.111.66.27]) by localhost (amavis.str.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id zfApY9uAl-Hd for ; Sat, 29 Dec 2007 03:10:05 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 53D6441C752; Sat, 29 Dec 2007 03:10:05 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 308AE444885 for ; Sat, 29 Dec 2007 02:08:08 +0000 (UTC) Date: Sat, 29 Dec 2007 02:08:08 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: freebsd-net@FreeBSD.org Message-ID: <20071229020307.P81630@maildrop.int.zabbadoz.net> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: [patch] ICMP unreach, frag needed but df set + ro mtu broken X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 02:10:09 -0000 Hi, while looking at an entirely different problems I found that icmp unreach, frag needed but df set does not take into account any route mtu in ip_forward breaking pmtu. I put a patch here: http://sources.zabbadoz.net/freebsd/patchset/patch-20071228-02-ip-forward-unreach-needfrag-ro.diff Testing and review welcome. PS: After a quick glance I think ip_fastfwd should be ok. -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT Software is harder than hardware so better get it right the first time. From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 03:02:29 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E78316A418; Sat, 29 Dec 2007 03:02:29 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from outbound0.sv.meer.net (outbound0.mx.meer.net [209.157.153.23]) by mx1.freebsd.org (Postfix) with ESMTP id D4E7E13C442; Sat, 29 Dec 2007 03:02:28 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) by outbound0.sv.meer.net (8.12.10/8.12.6) with ESMTP id lBT32Pih000379; Fri, 28 Dec 2007 19:02:25 -0800 (PST) (envelope-from gnn@neville-neil.com) Received: from minion.local.neville-neil.com (61.204.211.246.customerlink.pwd.ne.jp [61.204.211.246]) by mail.meer.net (8.13.3/8.13.3/meer) with ESMTP id lBT32OIN095874; Fri, 28 Dec 2007 19:02:24 -0800 (PST) (envelope-from gnn@neville-neil.com) Date: Sat, 29 Dec 2007 12:02:22 +0900 Message-ID: From: gnn@freebsd.org To: Marko Zec In-Reply-To: <200712282040.30745.zec@tel.fer.hr> References: <4772F123.5030303@elischer.org> <200712282040.30745.zec@tel.fer.hr> User-Agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.1.50 (i386-apple-darwin8.10.1) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Cc: FreeBSD Net , Qing Li , Robert Watson , Julian Elischer , freebsd-arch@freebsd.org Subject: Re: resend: multiple routing table roadmap (format fix) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 03:02:29 -0000 At Fri, 28 Dec 2007 20:40:30 +0100, Marko Zec wrote: > The thrust behind Julian's work seems to be providing multiple > forwarding tables for for purposes of traffic engineering / policy > based routing, with a single firewall instance used as a classifier. > vimage-style network stack virtualization provides for more strict > isolation on both port and IP address space, independent firewall > instances, IPSEC config / state etc., and as such might be better > suited for providing enhanced jail-style virtual hosting environments, > as well as for providing virtual router "slices". > > So once we get Julian's multi-FIB stuff in the base system, I see no > reason why we couldn't have this functionality replicated in > each "vimage" instance, i.e. have multiple independent virtual > networking environnments, each with multiple FIBs. > > Implementationwise, my hacks currently rely on macros for conditional > virtualization of global variables / structs. As long as Julian's > changes continue to be unconditional, i.e. without playing a similar > macroization game, I think integrating this code (once it hits HEAD) > into p4/projects/vimage should be more or less a straightforward job. Cool, that's what I wanted to hear. Best, George From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 12:49:53 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BD7C16A417; Sat, 29 Dec 2007 12:49:53 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id CE3A713C45A; Sat, 29 Dec 2007 12:49:52 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 17EE6209C; Sat, 29 Dec 2007 13:33:11 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.1/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id EA5312089; Sat, 29 Dec 2007 13:33:10 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id B1742844A7; Sat, 29 Dec 2007 13:33:10 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: net@freebsd.org, current@freebsd.org Date: Sat, 29 Dec 2007 13:33:10 +0100 Message-ID: <86ir2hznnd.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: kevlo@freebsd.org, sam@freebsd.org Subject: if_ral regression X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 12:49:53 -0000 I upgraded my router cum firewall cum access point (soekris net4801 with a cheap third-party ralink-based wlan adapter) from RELENG_6 to HEAD and noticed what seems to be a regression in if_ral. After a certain amount of use (i.e. actually having a client connected to it and transferring data), the connection falters, and eventually the client can no longer see even see the access point in a scan. Restarting the interface on the router (/etc/rc.d/netif restart ral0) fixes it. I now have a cron job that does this every five minutes. I still get occasional outages, but all I have to do is wait a few minutes for the cron job to kick in. Outages are clearly related to traffic; a sure-fire way to trigger one is to start a backup job on my laptop (rsync to my file server). I will lose the wlan connection repeatedly until I either stop trying or run the script with a bandwidth limit. des@soe ~% uname -a FreeBSD soe.des.no 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Dec 15 20:46:29 = UTC 2007 des@pwd.des.no:/usr/obj/usr/src/sys/soe i386 des@soe ~% kldstat -v Id Refs Address Size Name 1 18 0xc0400000 33fdfc kernel (/boot/soe/kernel) 2 1 0xc0740000 7690 if_sis.ko (/boot/soe/if_sis.ko) 3 2 0xc0748000 1dbe0 miibus.ko (/boot/soe/miibus.ko) 4 1 0xc0766000 18e28 if_ral.ko (/boot/soe/if_ral.ko) 5 4 0xc077f000 2a95c wlan.ko (/boot/soe/wlan.ko) 6 1 0xc07aa000 2cb0 wlan_acl.ko (/boot/soe/wlan_acl.ko) 7 1 0xc07ad000 1924 wlan_scan_ap.ko (/boot/soe/wlan_scan_ap.ko) 8 1 0xc107f000 6000 geom_md.ko (/boot/soe/geom_md.ko) 9 1 0xc10f9000 2000 pflog.ko (/boot/soe/pflog.ko) 10 1 0xc10fb000 2f000 pf.ko (/boot/soe/pf.ko) 11 4 0xc118d000 a000 netgraph.ko (/boot/soe/netgraph.ko) 12 1 0xc119c000 3000 ng_ether.ko (/boot/soe/ng_ether.ko) 13 1 0xc11a8000 5000 ng_pppoe.ko (/boot/soe/ng_pppoe.ko) 14 1 0xc11ad000 4000 ng_socket.ko (/boot/soe/ng_socket.ko) des@soe ~% grep ral0 /var/run/dmesg.boot ral0: mem 0xa0004000-0xa0005fff irq 11 at device= 10.0 on pci0 ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525 ral0: Ethernet address: 00:08:a1:8d:2f:73 ral0: [ITHREAD] des@soe ~% pciconf -lv [...] ral0@pci0:0:10:0: class=3D0x028000 card=3D0x00201371 chip=3D0x0201181= 4 rev=3D0x01 hdr=3D0x00 vendor =3D 'Ralink Technology, Corp' device =3D '0x03011814 Zonet ZEW1601 (Ralink Chipset) 802.11b/g WLA= N Card' class =3D network [...] des@soe ~% ifconfig ral0 ral0: flags=3D8843 metric 0 mtu 1500 ether 00:08:a1:8d:2f:73 inet 10.0.11.1 netmask 0xffffff00 broadcast 10.0.11.255 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g status: associated ssid des.no channel 1 (2412 Mhz 11g) bssid 00:08:a1:8d:2f:73 authmode OPEN privacy OFF txpower 50 scanvalid 60 bgscan bgscanintvl 300 bgscanidle 250 roam:rssi11g 7 roam:rate11g 5 protmode CTS dtimperiod 1 DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 13:20:31 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E15D816A46C for ; Sat, 29 Dec 2007 13:20:31 +0000 (UTC) (envelope-from kimimeister@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.183]) by mx1.freebsd.org (Postfix) with ESMTP id BFFEB13C458 for ; Sat, 29 Dec 2007 13:20:31 +0000 (UTC) (envelope-from kimimeister@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so6559808waf.3 for ; Sat, 29 Dec 2007 05:20:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; bh=RDxdEdPFLz7SAVRwiwmpKjLyfokbXK0ldsE00ZbB96w=; b=kFgPltRCAjZtYoYGp7htP1JBbjwUOWv3aqw1zovQUQzKCvjQmdW4eOwFUrpPWut4Ow/7vI+nTIX+W1I81rP/+1zFoOxAYNOb5M6WiaI5aFvNG+3gsDOSXSq/0LFidcXvDQQRBXJlh19oap2Uu5TqUc95/iI/I4RNzgy31vTl7ek= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; b=ZcDPQlwdrd19giyDhb3fkjsv9Hr82MeFmqpd3XYXbEdB9DeKh70HwP9/ZIj2cY0iTHfwze/4a0oCR7xi6c4Qi1jZ23gYJ+fv1zQNyFO6W3HRy7oVkbu4OrK7kBynpB17Ivd3et9KsD3ejbnEBayQY+KKarHMWKaOrG2LfwkxwIk= Received: by 10.115.90.1 with SMTP id s1mr7653305wal.41.1198932725716; Sat, 29 Dec 2007 04:52:05 -0800 (PST) Received: by 10.114.111.17 with HTTP; Sat, 29 Dec 2007 04:52:05 -0800 (PST) Message-ID: <42b497160712290452q1c33d561n394ecf642e7cd1de@mail.gmail.com> Date: Sat, 29 Dec 2007 12:52:05 +0000 From: Kimi To: current@freebsd.org, net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: des@des.no Subject: if_ral regression X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 13:20:32 -0000 did you check the archives? one suggested "fix" is to turn off netisr ? sysctl net.isr.direct=0 fixed things for me when I had ral based cards, and do this be default with any machine with wireless, even though I use ath based cards now. Happy new year -- Kimi From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 20:45:10 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CED2116A420 for ; Sat, 29 Dec 2007 20:45:10 +0000 (UTC) (envelope-from bazzoola@gmail.com) Received: from qb-out-0506.google.com (qb-out-0506.google.com [72.14.204.233]) by mx1.freebsd.org (Postfix) with ESMTP id 7F8EE13C45A for ; Sat, 29 Dec 2007 20:45:10 +0000 (UTC) (envelope-from bazzoola@gmail.com) Received: by qb-out-0506.google.com with SMTP id a10so68507qbd.7 for ; Sat, 29 Dec 2007 12:45:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:cc:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; bh=wD/D/1ti28yp46aUZi08QVKM47IRgkVF+Ftug6jjWKc=; b=afiU66ll1fgFOQG5tWCwjNe0guvc8CP7woUCOThv54TXcmTjbonEYHj7uU6hIJPoEOTbZQ6pv6wseA93cwST3Sv8x7NgCmFoqjGTy66NPWqKOyJVKNah9jy4tiRIeE+RivKmbfhlRiQ/V58FiZ69TI8+v2bsW2CIgWgtq7NejoM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=cc:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; b=NktgvXP+bjdK/CZwQzpKqVAiJ1DxD02Ht1jndbHBPvDlwPzlEBmgYERW71XdZAo26BlkbOK++JIYn7gCSvOrcSz7YNrjk1G7rAMlTSD6O1XG/TqbEBWdpWnqcnfD5MUr+WlxH5ksPDUPagzbPkgzxheY1Rh05ztaxjT6ZrXYGao= Received: by 10.65.54.9 with SMTP id g9mr21188927qbk.3.1198960140007; Sat, 29 Dec 2007 12:29:00 -0800 (PST) Received: from macwire.local ( [76.247.130.45]) by mx.google.com with ESMTPS id 38sm9415839nzf.10.2007.12.29.12.28.57 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 29 Dec 2007 12:28:58 -0800 (PST) Message-Id: From: bazzoola To: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= In-Reply-To: <86ir2hznnd.fsf@ds4.des.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v915) Date: Sat, 29 Dec 2007 15:28:55 -0500 References: <86ir2hznnd.fsf@ds4.des.no> X-Mailer: Apple Mail (2.915) Cc: kevlo@freebsd.org, sam@freebsd.org, current@freebsd.org, net@freebsd.org Subject: Re: if_ral regression X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 20:45:11 -0000 On Dec 29, 2007, at 7:33 AM, Dag-Erling Sm=F8rgrav wrote: > I upgraded my router cum firewall cum access point (soekris net4801 =20= > with > a cheap third-party ralink-based wlan adapter) from RELENG_6 to HEAD =20= > and > noticed what seems to be a regression in if_ral. After a certain =20 > amount > of use (i.e. actually having a client connected to it and transferring > data), the connection falters, and eventually the client can no longer > see even see the access point in a scan. Restarting the interface on > the router (/etc/rc.d/netif restart ral0) fixes it. I now have a cron > job that does this every five minutes. I still get occasional =20 > outages, > but all I have to do is wait a few minutes for the cron job to kick =20= > in. > > Outages are clearly related to traffic; a sure-fire way to trigger one > is to start a backup job on my laptop (rsync to my file server). I =20= > will > lose the wlan connection repeatedly until I either stop trying or run > the script with a bandwidth limit. > > des@soe ~% uname -a > FreeBSD soe.des.no 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Dec 15 =20 > 20:46:29 UTC 2007 des@pwd.des.no:/usr/obj/usr/src/sys/soe i386 > des@soe ~% kldstat -v > Id Refs Address Size Name > 1 18 0xc0400000 33fdfc kernel (/boot/soe/kernel) > 2 1 0xc0740000 7690 if_sis.ko (/boot/soe/if_sis.ko) > 3 2 0xc0748000 1dbe0 miibus.ko (/boot/soe/miibus.ko) > 4 1 0xc0766000 18e28 if_ral.ko (/boot/soe/if_ral.ko) > 5 4 0xc077f000 2a95c wlan.ko (/boot/soe/wlan.ko) > 6 1 0xc07aa000 2cb0 wlan_acl.ko (/boot/soe/wlan_acl.ko) > 7 1 0xc07ad000 1924 wlan_scan_ap.ko (/boot/soe/wlan_scan_ap.ko) > 8 1 0xc107f000 6000 geom_md.ko (/boot/soe/geom_md.ko) > 9 1 0xc10f9000 2000 pflog.ko (/boot/soe/pflog.ko) > 10 1 0xc10fb000 2f000 pf.ko (/boot/soe/pf.ko) > 11 4 0xc118d000 a000 netgraph.ko (/boot/soe/netgraph.ko) > 12 1 0xc119c000 3000 ng_ether.ko (/boot/soe/ng_ether.ko) > 13 1 0xc11a8000 5000 ng_pppoe.ko (/boot/soe/ng_pppoe.ko) > 14 1 0xc11ad000 4000 ng_socket.ko (/boot/soe/ng_socket.ko) > des@soe ~% grep ral0 /var/run/dmesg.boot > ral0: mem 0xa0004000-0xa0005fff irq 11 at =20= > device 10.0 on pci0 > ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525 > ral0: Ethernet address: 00:08:a1:8d:2f:73 > ral0: [ITHREAD] > des@soe ~% pciconf -lv > [...] > ral0@pci0:0:10:0: class=3D0x028000 card=3D0x00201371 =20 > chip=3D0x02011814 rev=3D0x01 hdr=3D0x00 > vendor =3D 'Ralink Technology, Corp' > device =3D '0x03011814 Zonet ZEW1601 (Ralink Chipset) 802.11b/g = =20 > WLAN Card' > class =3D network > [...] > des@soe ~% ifconfig ral0 > ral0: flags=3D8843 metric 0 =20= > mtu 1500 > ether 00:08:a1:8d:2f:73 > inet 10.0.11.1 netmask 0xffffff00 broadcast 10.0.11.255 > media: IEEE 802.11 Wireless Ethernet autoselect mode 11g =20 > > status: associated > ssid des.no channel 1 (2412 Mhz 11g) bssid 00:08:a1:8d:2f:73 > authmode OPEN privacy OFF txpower 50 scanvalid 60 bgscan > bgscanintvl 300 bgscanidle 250 roam:rssi11g 7 roam:rate11g 5 > protmode CTS dtimperiod 1 > > DES > --=20 > Dag-Erling Sm=F8rgrav - des@des.no > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org=20 > " I have reported a similar regression please see http://www.freebsd.org/cgi/query-pr.cgi?pr=3D117655 = http://lists.freebsd.org/pipermail/freebsd-stable/2007-October/037636.html= Thanks!= From owner-freebsd-net@FreeBSD.ORG Sat Dec 29 23:37:59 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 360F316A417; Sat, 29 Dec 2007 23:37:59 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id E7AA313C448; Sat, 29 Dec 2007 23:37:58 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id B518420B1; Sun, 30 Dec 2007 00:37:50 +0100 (CET) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: -0.1/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 2A9C32099; Sun, 30 Dec 2007 00:37:50 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id E7D308448A; Sun, 30 Dec 2007 00:37:49 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Kimi References: <42b497160712290452q1c33d561n394ecf642e7cd1de@mail.gmail.com> Date: Sun, 30 Dec 2007 00:37:49 +0100 In-Reply-To: <42b497160712290452q1c33d561n394ecf642e7cd1de@mail.gmail.com> (Kimi's message of "Sat\, 29 Dec 2007 12\:52\:05 +0000") Message-ID: <86abnt138y.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: current@freebsd.org, net@freebsd.org Subject: Re: if_ral regression X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Dec 2007 23:37:59 -0000 Kimi writes: > sysctl net.isr.direct=3D0 Tried that, problem still occurs. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no