From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 01:53:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CCC716A419; Sun, 16 Dec 2007 01:53:38 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out3.smtp.messagingengine.com (out3.smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 0016E13C455; Sun, 16 Dec 2007 01:53:37 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 4F7667A81C; Sat, 15 Dec 2007 20:53:37 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Sat, 15 Dec 2007 20:53:37 -0500 X-Sasl-enc: i0i26hJIk+C1cE7IweXItrQkKeDv1sFLAc7IRl1mE99g 1197770017 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id 8AEFEC1D6; Sat, 15 Dec 2007 20:53:36 -0500 (EST) Message-ID: <4764851F.1000304@FreeBSD.org> Date: Sun, 16 Dec 2007 01:53:35 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Max Laier References: <47628E11.7030803@tomjudge.com> <4762AC1E.3030101@FreeBSD.org> <200712142030.14728.max@love2party.net> In-Reply-To: <200712142030.14728.max@love2party.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Tom Judge , freebsd-net , freebsd-pf@freebsd.org Subject: Re: Spurious error from i[pf]_carp X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 01:53:38 -0000 Max Laier wrote: > Alternatively you could change IPPROTO_CARP in netinet/in.h to another > unused protocol number. This is really the preferred way of dealing with > mixed CARP and VRRP environments as the CARP packets might in turn > irritate the VRRP routers, too. > This sounds like a common use case. Perhaps there is motivation for making the protocol number used by CARP a loader tunable? [I'd really like it if we had a kernel API for adding the virtual MAC addresses to ifnet too, then again I'd like the cheat for infinite chocolate fudge sundaes in life, bed and breakfast at The Savoy with my choice of actress, etc] > /* no comment */ > No disrespect to anyone intended, just that CARP does duplicate the functionality of VRRP. It's worth reiterating that this is what happens when software patents are allowed to creep in to the nuts and bolts of the operational Internet -- and thus, CARP was born, and thus Tom runs into the issue he has seen. later BMS From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 03:29:05 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6A1116A418; Sun, 16 Dec 2007 03:29:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CC6E213C458; Sun, 16 Dec 2007 03:29:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBG3T5iT040971; Sun, 16 Dec 2007 03:29:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBG3T5hO040967; Sun, 16 Dec 2007 03:29:05 GMT (envelope-from linimon) Date: Sun, 16 Dec 2007 03:29:05 GMT Message-Id: <200712160329.lBG3T5hO040967@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/118727: [ng] [patch] add new ng_pf module X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 03:29:06 -0000 Synopsis: [ng] [patch] add new ng_pf module Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Sun Dec 16 03:28:46 UTC 2007 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=118727 From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 08:26:56 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0D07A16A4CD for ; Sun, 16 Dec 2007 08:26:56 +0000 (UTC) (envelope-from randy@psg.com) Received: from rip.psg.com (rip.psg.com [147.28.0.39]) by mx1.freebsd.org (Postfix) with ESMTP id D28D813C468 for ; Sun, 16 Dec 2007 08:26:55 +0000 (UTC) (envelope-from randy@psg.com) Received: from [202.214.86.183] by rip.psg.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from ) id 1J3opq-000Bot-BN; Sun, 16 Dec 2007 08:26:54 +0000 Message-ID: <4764E14B.6090501@psg.com> Date: Sun, 16 Dec 2007 17:26:51 +0900 From: Randy Bush User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Adam McDougall References: <18275.26892.441538.563720@roam.psg.com> <20071216063153.GO54682@egr.msu.edu> In-Reply-To: <20071216063153.GO54682@egr.msu.edu> X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Net Subject: Re: ath wep confusion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 08:26:56 -0000 > ifconfig_ath0="channel 4 ssid rgnet-aden wep wepkey 13-characters mediaopt hostap up" ! thank you. also needed to tell winxp that it was private security not enterprise. randy From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 08:55:27 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 959D216A47F for ; Sun, 16 Dec 2007 08:55:27 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.freebsd.org (Postfix) with ESMTP id 401F313C44B for ; Sun, 16 Dec 2007 08:55:25 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.5) with SMTP id TAA09181; Sun, 16 Dec 2007 19:55:09 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Sun, 16 Dec 2007 19:55:08 +1100 (EST) From: Ian Smith To: Randy Bush In-Reply-To: <4764E14B.6090501@psg.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: FreeBSD Net , Adam McDougall Subject: Re: ath wep confusion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 08:55:27 -0000 On Sun, 16 Dec 2007, Randy Bush wrote: > > ifconfig_ath0="channel 4 ssid rgnet-aden wep wepkey 13-characters mediaopt hostap up" > > ! thank you. Now I'm confused. Isn't that what you already had? > also needed to tell winxp that it was private security not enterprise. Ahah. cheers, Ian From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 09:14:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD6EE16A419 for ; Sun, 16 Dec 2007 09:14:52 +0000 (UTC) (envelope-from randy@psg.com) Received: from rip.psg.com (rip.psg.com [147.28.0.39]) by mx1.freebsd.org (Postfix) with ESMTP id 7813413C448 for ; Sun, 16 Dec 2007 09:14:52 +0000 (UTC) (envelope-from randy@psg.com) Received: from [202.214.86.183] by rip.psg.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from ) id 1J3paC-000BsJ-Ew; Sun, 16 Dec 2007 09:14:48 +0000 Message-ID: <4764EC85.70503@psg.com> Date: Sun, 16 Dec 2007 18:14:45 +0900 From: Randy Bush User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Ian Smith References: In-Reply-To: X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Adam McDougall Subject: Re: ath wep confusion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 09:14:52 -0000 Ian Smith wrote: >>> ifconfig_ath0="channel 4 ssid rgnet-aden wep wepkey 13-characters mediaopt hostap up" >> ! thank you. ^deftxkey 1 randy From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 09:32:40 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3F2816A504 for ; Sun, 16 Dec 2007 09:32:40 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from gaia.nimnet.asn.au (nimbin.lnk.telstra.net [139.130.45.143]) by mx1.freebsd.org (Postfix) with ESMTP id 6E00C13C461 for ; Sun, 16 Dec 2007 09:32:38 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from localhost (smithi@localhost) by gaia.nimnet.asn.au (8.8.8/8.8.8R1.5) with SMTP id UAA09815; Sun, 16 Dec 2007 20:32:32 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Sun, 16 Dec 2007 20:32:31 +1100 (EST) From: Ian Smith To: Randy Bush In-Reply-To: <4764EC85.70503@psg.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: FreeBSD Net , Adam McDougall Subject: Re: ath wep confusion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 09:32:40 -0000 On Sun, 16 Dec 2007, Randy Bush wrote: > Ian Smith wrote: > >>> ifconfig_ath0="channel 4 ssid rgnet-aden wep wepkey 13-characters mediaopt hostap up" > >> ! thank you. ^deftxkey 1 'k <%^}= From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 16:21:07 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1BD816A46C for ; Sun, 16 Dec 2007 16:21:07 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from smtp805.mail.ird.yahoo.com (smtp805.mail.ird.yahoo.com [217.146.188.65]) by mx1.freebsd.org (Postfix) with SMTP id 0BCB013C44B for ; Sun, 16 Dec 2007 16:21:06 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: (qmail 21377 invoked from network); 16 Dec 2007 15:54:24 -0000 Received: from unknown (HELO ?192.168.1.2?) (thomasjudge@btinternet.com@86.139.146.42 with plain) by smtp805.mail.ird.yahoo.com with SMTP; 16 Dec 2007 15:54:24 -0000 X-YMail-OSG: vwv9nTQVM1myiwPXjo0O5OiOoXoeTIE0AwJqPLtNK_l99g53 Message-ID: <47654B58.7070500@tomjudge.com> Date: Sun, 16 Dec 2007 15:59:20 +0000 From: Tom Judge User-Agent: Thunderbird 1.5.0.13 (X11/20070824) MIME-Version: 1.0 To: "Bruce M. Simpson" References: <47628E11.7030803@tomjudge.com> <4762AC1E.3030101@FreeBSD.org> <200712142030.14728.max@love2party.net> <4764851F.1000304@FreeBSD.org> In-Reply-To: <4764851F.1000304@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Max Laier , freebsd-pf@freebsd.org, freebsd-net Subject: Re: Spurious error from i[pf]_carp X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 16:21:07 -0000 Bruce M. Simpson wrote: > Max Laier wrote: >> Alternatively you could change IPPROTO_CARP in netinet/in.h to another >> unused protocol number. This is really the preferred way of dealing >> with mixed CARP and VRRP environments as the CARP packets might in >> turn irritate the VRRP routers, too. >> This seems to make the most sense to me. At this time it seems (in RELENG_6_2 at least) that because the protocol number is shared with VRRP that tcpdump tries to decode the CARP frames as VRRP frames and although the header/frame is very simple this does not provide a useful decoding of the CARP frame. After the protocol number is changed it would be possible to write a proper carp decoder for tcpdump or at least make any existing decoder be able to tell the difference between VRRP and CARP frames. > This sounds like a common use case. Perhaps there is motivation for > making the protocol number used by CARP a loader tunable? > > [I'd really like it if we had a kernel API for adding the virtual MAC > addresses to ifnet too, then again I'd like the cheat for infinite > chocolate fudge sundaes in life, bed and breakfast at The Savoy with my > choice of actress, etc] >> /* no comment */ >> > No disrespect to anyone intended, just that CARP does duplicate the > functionality of VRRP. > Please correct me if I am wrong, from the limited research I have done, carp was born because Cisco made a patent claim (based on its patents for HSRP) against a VRRP implementation. > It's worth reiterating that this is what happens when software patents > are allowed to creep in to the nuts and bolts of the operational > Internet -- and thus, CARP was born, and thus Tom runs into the issue he > has seen. > > later > BMS > Thoughts? Tom From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 18:21:57 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C9D616A41B for ; Sun, 16 Dec 2007 18:21:57 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.185]) by mx1.freebsd.org (Postfix) with ESMTP id 6384213C448 for ; Sun, 16 Dec 2007 18:21:57 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: by rv-out-0910.google.com with SMTP id l15so1749606rvb.43 for ; Sun, 16 Dec 2007 10:21:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=uik7m/RxoAeUjvHsRS8QdHR+3inmaHp1mh3PAQoWCnA=; b=GUhkEAkYazHOxrfvi8qs4KTOJpjXEZRUh3Px/j04gb6lbujy/waoC+gpFPd/9ngdRnQT2PJAdJyHXsVn06ofsEYmuoCEF0U5dAIJXnLZkhrAr25WJELJl2jo/+pyapEEzslT2o3sJGM44KsQP0rnfgcZYX2YYpxVqWwmb5ENj+0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=v7iXIcsoWja9agCD+izqCnBbvI9N7g4C9FfRdv+pdVDItzjq7vD9XapS3FTChXn5JlqmorSbdTLy+4HHB8sRqFh4h446M8TUYZxtEofye2gxYQk/WnrcGt/5XGr1ylxOcoEvpjwcJLJx1ZsSd/z1Q/rIYCzhVkL95b6Xr/1YQ94= Received: by 10.140.170.12 with SMTP id s12mr3356730rve.83.1197829316467; Sun, 16 Dec 2007 10:21:56 -0800 (PST) Received: by 10.141.170.18 with HTTP; Sun, 16 Dec 2007 10:21:56 -0800 (PST) Message-ID: <2e77fc10712161021x378114eeh8cc0b2e0809800db@mail.gmail.com> Date: Sun, 16 Dec 2007 13:21:56 -0500 From: "Niki Denev" Sender: ndenev@gmail.com To: freebsd-net@freebsd.org In-Reply-To: <2e77fc10712140937i19741f9cwe717499b18012a9a@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2e77fc10712132129o810a608v4ec6a742f9860a63@mail.gmail.com> <47625B80.3090904@FreeBSD.org> <2e77fc10712140937i19741f9cwe717499b18012a9a@mail.gmail.com> X-Google-Sender-Auth: 5658208f0d6c2521 Subject: Re: is carp on if_bridge possible? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 18:21:57 -0000 On Dec 14, 2007 12:37 PM, Niki Denev wrote: > > On Dec 14, 2007 5:31 AM, Bruce M. Simpson wrote: > > > > Niki Denev wrote: > > > Hello, > > > > > > Is this possible? > > > I've tried adding IFT_BRIDGE next to IFT_ETHER and IFT_L2VLAN in ip_carp.c > > > but this probably is not enough. Any ideas? > > > > > > > CARP is 'special' in that it needs to add its own MAC addresses to your > > interface, needs a bit of special cooperation between the IP layer and > > the MAC layer, and it's more than likely that this doesn't work with > > if_bridge. > > > > Like Max says, this is an unusual configuration.... what are you trying > > to do? > > > > BMS > > > > > > I'm trying to setup a highly redundant configuration of > two routers and two rstp capable switches behind them. > Each of the router is connected to each of the switches, > and it's two interfaces are part of a bridge group. > this way i can handle router and/or switch failure without > disconnecting the site. > The problem is that this a remote site which must not go offline by > any means, and thus the unusual setup. > > Hope that this explains it. > > Niki > Maybe using bridge with rstp for failover was not the best idea, and i switched to if_lagg and if_carp on top of it. It seems to work properly and is exactly what i wanted to achieve. Thanks, Niki From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 21:31:35 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B343E16A476 for ; Sun, 16 Dec 2007 21:31:35 +0000 (UTC) (envelope-from hhw@astutehosting.com) Received: from defout.telus.net (defout.telus.net [199.185.220.240]) by mx1.freebsd.org (Postfix) with ESMTP id 75EE713C4F0 for ; Sun, 16 Dec 2007 21:31:35 +0000 (UTC) (envelope-from hhw@astutehosting.com) Received: from priv-edtnaa05.telusplanet.net ([154.5.184.6]) by priv-edtnes27.telusplanet.net (InterMail vM.7.08.02.02 201-2186-121-104-20070414) with ESMTP id <20071216210320.CYFN1467.priv-edtnes27.telusplanet.net@priv-edtnaa05.telusplanet.net>; Sun, 16 Dec 2007 14:03:20 -0700 Received: from [192.168.3.9] (d154-5-184-6.bchsia.telus.net [154.5.184.6]) by priv-edtnaa05.telusplanet.net (BorderWare MXtreme Infinity Mail Firewall) with ESMTP id 9BK4ACKE3D; Sun, 16 Dec 2007 14:03:15 -0700 (MST) Message-ID: <47659291.6050809@astutehosting.com> Date: Sun, 16 Dec 2007 13:03:13 -0800 From: Han Hwei Woo User-Agent: Thunderbird 2.0.0.9 (X11/20071123) MIME-Version: 1.0 To: Niki Denev References: <2e77fc10712132129o810a608v4ec6a742f9860a63@mail.gmail.com> <47625B80.3090904@FreeBSD.org> <2e77fc10712140937i19741f9cwe717499b18012a9a@mail.gmail.com> <2e77fc10712161021x378114eeh8cc0b2e0809800db@mail.gmail.com> In-Reply-To: <2e77fc10712161021x378114eeh8cc0b2e0809800db@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: is carp on if_bridge possible? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 21:31:35 -0000 Hi Niki, I hope I'm understanding you correctly, but the reason you're running if_lag is so that failover will occur even if there is a switch failure? If you enable preempt by setting: sysctl net.inet.carp.preempt=1, and you have a carp running on the routers' interface that goes through the switches, all the carp interfaces would failover in the event of a switch failure, including the external facing one. With bridging or link aggregation, there is nothing to stop a router from staying the master on the external interface, even if the switch it is connected to fails. Cheers, Han Hwei Woo Niki Denev wrote: > On Dec 14, 2007 12:37 PM, Niki Denev wrote: > >> On Dec 14, 2007 5:31 AM, Bruce M. Simpson wrote: >> >>> Niki Denev wrote: >>> >>>> Hello, >>>> >>>> Is this possible? >>>> I've tried adding IFT_BRIDGE next to IFT_ETHER and IFT_L2VLAN in ip_carp.c >>>> but this probably is not enough. Any ideas? >>>> >>>> >>> CARP is 'special' in that it needs to add its own MAC addresses to your >>> interface, needs a bit of special cooperation between the IP layer and >>> the MAC layer, and it's more than likely that this doesn't work with >>> if_bridge. >>> >>> Like Max says, this is an unusual configuration.... what are you trying >>> to do? >>> >>> BMS >>> >>> >>> >> I'm trying to setup a highly redundant configuration of >> two routers and two rstp capable switches behind them. >> Each of the router is connected to each of the switches, >> and it's two interfaces are part of a bridge group. >> this way i can handle router and/or switch failure without >> disconnecting the site. >> The problem is that this a remote site which must not go offline by >> any means, and thus the unusual setup. >> >> Hope that this explains it. >> >> Niki >> >> > > > Maybe using bridge with rstp for failover was not the best idea, and i > switched to if_lagg > and if_carp on top of it. > It seems to work properly and is exactly what i wanted to achieve. > > Thanks, > Niki > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > From owner-freebsd-net@FreeBSD.ORG Sun Dec 16 21:59:29 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA97E16A418 for ; Sun, 16 Dec 2007 21:59:29 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.190]) by mx1.freebsd.org (Postfix) with ESMTP id 8D49413C4DD for ; Sun, 16 Dec 2007 21:59:29 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: by rv-out-0910.google.com with SMTP id l15so1818846rvb.43 for ; Sun, 16 Dec 2007 13:59:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=5kk3FlGYvoXa5nuIlDSxxyriPk7MTFj9By4sMpERXYs=; b=O46NFeBvLE9KDnv1W9Y1F+aTGTbCT8KRs2zixZcSxvQQGydgYKPOuncunYshfPQ2UUE67HB9YxFMCKD3mS+fl2J7+fqvKvUy/70MnRVZ/C4yWuGiYNet+0c1uNFfBnW+TSctnaCkpBJ/E4dVM5xUJRYwfmSO+klBEK0xLyKO/6Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=VYo/KnszEqqLcjTxCuEOwGTD9TVCufT18jO6+P5pjEQgtwah8knq6I2S6y5sTXRGrVSQaGS0bOzVjkku2shcWQDlC9OQ15j8YuRzR1I5+ty2IiYCpiAZTOwcO8N+LODVciX5Hyn+UvIrjKhtvMhZSC233bTE83U/abqLVzS33wM= Received: by 10.140.88.42 with SMTP id l42mr3493173rvb.95.1197842369171; Sun, 16 Dec 2007 13:59:29 -0800 (PST) Received: by 10.141.170.18 with HTTP; Sun, 16 Dec 2007 13:59:29 -0800 (PST) Message-ID: <2e77fc10712161359u17ae857flee75401c85516f77@mail.gmail.com> Date: Sun, 16 Dec 2007 16:59:29 -0500 From: "Niki Denev" Sender: ndenev@gmail.com To: "Han Hwei Woo" In-Reply-To: <47659291.6050809@astutehosting.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2e77fc10712132129o810a608v4ec6a742f9860a63@mail.gmail.com> <47625B80.3090904@FreeBSD.org> <2e77fc10712140937i19741f9cwe717499b18012a9a@mail.gmail.com> <2e77fc10712161021x378114eeh8cc0b2e0809800db@mail.gmail.com> <47659291.6050809@astutehosting.com> X-Google-Sender-Auth: 3ecdf5c5f0fbbd68 Cc: freebsd-net@freebsd.org Subject: Re: is carp on if_bridge possible? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2007 21:59:29 -0000 On Dec 16, 2007 4:03 PM, Han Hwei Woo wrote: > > Hi Niki, > > I hope I'm understanding you correctly, but the reason you're running > if_lag is so that failover will occur even if there is a switch failure? > > If you enable preempt by setting: sysctl net.inet.carp.preempt=1, and you > have a carp running on the routers' interface that goes through the > switches, all the carp interfaces would failover in the event of a switch > failure, including the external facing one. With bridging or link > aggregation, there is nothing to stop a router from staying the master on > the external interface, even if the switch it is connected to fails. > > > Cheers, > Han Hwei Woo > > Hi Han, Yes, I have net.inet.carp.preempt enabled, but i'm using carp only on the internal interfaces to provide virtual default gateway for the servers behind it. Each of the routers speaks BGP (using openbgpd) to our provider and has it's own fiber uplink. The routers are also directly connected to each other with iBGP and OSPF(using openospfd) so this is not an issue. Regards, Niki From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 04:23:31 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76E5216A417; Mon, 17 Dec 2007 04:23:31 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5426013C45A; Mon, 17 Dec 2007 04:23:31 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBH4NVVc002427; Mon, 17 Dec 2007 04:23:31 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBH4NVw4002423; Mon, 17 Dec 2007 04:23:31 GMT (envelope-from linimon) Date: Mon, 17 Dec 2007 04:23:31 GMT Message-Id: <200712170423.lBH4NVw4002423@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/118722: [tcp] Many old TCP connections in SYN_RCVD state X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 04:23:31 -0000 Synopsis: [tcp] Many old TCP connections in SYN_RCVD state Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Mon Dec 17 04:22:33 UTC 2007 Responsible-Changed-Why: Reassign to -net. http://www.freebsd.org/cgi/query-pr.cgi?pr=118722 From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 05:21:19 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CB8916A419 for ; Mon, 17 Dec 2007 05:21:19 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 291EF13C468 for ; Mon, 17 Dec 2007 05:21:19 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 31882 invoked from network); 17 Dec 2007 04:54:38 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 17 Dec 2007 04:54:38 -0000 Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Sun, 16 Dec 2007 23:54:28 -0500 To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@FreeBSD.org Subject: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 05:21:19 -0000 While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as far as ether_input() then get lost. Test setup: A - ethernet_switch - B A sends UDP packets to B through an ethernet switch. The interface input packet count and output packet count on the switch match what A is sending and B should be receiving. A UDP receiver running on B sees windows of packet loss with a period of 30.99 seconds. The lost packets are counted based on an incrementing sequence number. On an isolated network the Ipkts counter on B matches what A is sending, but the packets never show up in any of the IP/UDP counters or the program trying to receive them. This behavior can be seen with both em and fxp interfaces. Problem is it only occurs after the receiving host has been up about a day. Reboot, problem clears. GENERIC kernel, nothing more than default daemons running. Behavior seen on three different motherboards so far. It also appears this is not just lost network interrupts. Whatever is spinning in the kernel also impacts syscall latency. An easy way to replicate what I'm seeing is to run gettimeofday() in a tight loop and note when the real time syscall delay exceeds some value (which is dependent on processor speed). As an example on an 3.20GHz CPU a small program will output when the syscall latency is > 5000 usecs. Note the periodic behavior at 30.99 seconds. These big jumps in latency correspond to when packets are being dropped. usecs (epoch) latency diffoutput from packet loss tester) window_start/window_end is packet counter time_start/time_end is absolute time in usecs. window_diff is # of packets missing The test is run at about 15.5Kpps / 132Mbits/second, certainly a lot less than this hardware is capable of running BSD4.X. :missing window_start=311510, time_start=1197861726332008,window_end=311638, time_end=1197861726332011, window_diff=128, time_diff=3 :missing window_start=794482, time_start=1197861757331505,window_end=794609, time_end=1197861757331509, window_diff=127, time_diff=4 :missing window_start=1277313, time_start=1197861788331245,window_end=1277444, time_end=1197861788331249, window_diff=131, time_diff=4 :missing window_start=1760104, time_start=1197861819330625,window_end=1760232, time_end=1197861819330629, window_diff=128, time_diff=4 :missing window_start=2242789, time_start=1197861850330170,window_end=2242916, time_end=1197861850330174, window_diff=127, time_diff=4 :missing window_start=2725818, time_start=1197861881329712,window_end=2725946, time_end=1197861881329715, window_diff=128, time_diff=3 :missing window_start=3208594, time_start=1197861912329261,window_end=3208722, time_end=1197861912329264, window_diff=128, time_diff=3 :missing window_start=3691395, time_start=1197861943328802,window_end=3691522, time_end=1197861943328805, window_diff=127, time_diff=3 :missing window_start=4173793, time_start=1197861974328369,window_end=4173921, time_end=1197861974328373, window_diff=128, time_diff=4 :missing window_start=4656236, time_start=1197862005328176,window_end=4656367, time_end=1197862005328179, window_diff=131, time_diff=3 :missing window_start=5139197, time_start=1197862036327576,window_end=5139325, time_end=1197862036327580, window_diff=128, time_diff=4 :missing window_start=5621958, time_start=1197862067327208,window_end=5622085, time_end=1197862067327211, window_diff=127, time_diff=3 :missing window_start=6104597, time_start=1197862098326839,window_end=6104725, time_end=1197862098326843, window_diff=128, time_diff=4 :missing window_start=6587241, time_start=1197862129326514,window_end=6587369, time_end=1197862129326534, window_diff=128, time_diff=20 :missing window_start=7070051, time_start=1197862160326368,window_end=7070183, time_end=1197862160326371, window_diff=132, time_diff=3 :missing window_start=7552828, time_start=1197862191325873,window_end=7552954, time_end=1197862191325876, window_diff=126, time_diff=3 :missing window_start=8035434, time_start=1197862222325572,window_end=8035560, time_end=1197862222325576, window_diff=126, time_diff=4 I'm building a more up to date copy of RELENG_6 to make sure I'm not chasing something that's been fixed. As a side note this appears to also be happening on a RELENG_6 build dated Mar 11 2007. Included is the gettimeofday() looper. Run as ./a.out 1 5000, where 5000 will depend on your system speed. This probably won't provide any meaningful results on a loaded system. E-mail me off list for a copy of the packet tester or more diagnostics. #include #include #include #include #include #include #include #include main(int argc, char **argv) { struct timeval tv; struct timezone tz; u_int64_t time_now, time_last, time_mark; int quiet, max; if (argc != 3) errx(1, "Usage: %s ", argv[0]); quiet=atoi(argv[1]); max=atoi(argv[2]); gettimeofday(&tv, &tz); time_last = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t)tv.tv_usec; time_mark = 0LL; for (;;) { gettimeofday(&tv, &tz); time_now = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t) tv.tv_usec; if (!quiet) { printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } else { if ((time_now-time_last) > max) { if (time_mark == 0) time_mark = time_now; printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } } time_last = time_now; } } /* main */ From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 05:43:34 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 831D816A417 for ; Mon, 17 Dec 2007 05:43:34 +0000 (UTC) (envelope-from dtynan@kalopa.com) Received: from mail.kalopa.net (mail.kalopa.net [82.195.155.65]) by mx1.freebsd.org (Postfix) with ESMTP id 0A12613C447 for ; Mon, 17 Dec 2007 05:43:33 +0000 (UTC) (envelope-from dtynan@kalopa.com) Received: from mail.kalopa.com (mail.kgbb.net [84.203.222.58]) by mail.kalopa.net (8.13.6/8.13.3) with ESMTP id lBH5UuKo026934; Mon, 17 Dec 2007 05:30:56 GMT (envelope-from dtynan@kalopa.com) Received: (from dtynan@localhost) by mail.kalopa.com (8.11.3/8.11.3) id lBH5ONH60349; Mon, 17 Dec 2007 05:24:23 GMT (envelope-from dtynan) Received: from mail.kalopa.net (mail.kalopa.net [82.195.155.65]) by mail.kalopa.com (8.11.3/8.11.3) with ESMTP id lBH5OM560342 for ; Mon, 17 Dec 2007 05:24:22 GMT (envelope-from owner-freebsd-stable@freebsd.org) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mail.kalopa.net (8.13.6/8.13.3) with ESMTP id lBH5U3oZ026910 for ; Mon, 17 Dec 2007 05:30:34 GMT (envelope-from owner-freebsd-stable@freebsd.org) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id D9E695D36A; Mon, 17 Dec 2007 05:21:31 +0000 (UTC) (envelope-from owner-freebsd-stable@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 71C6216A594; Mon, 17 Dec 2007 05:21:28 +0000 (UTC) (envelope-from owner-freebsd-stable@freebsd.org) Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64C6A16A420 for ; Mon, 17 Dec 2007 05:21:19 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 2D7AF13C46B for ; Mon, 17 Dec 2007 05:21:19 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 31882 invoked from network); 17 Dec 2007 04:54:38 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 17 Dec 2007 04:54:38 -0000 Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer X-DoIKnowU: Addr=[/v/dtynan/Mail/Addresses] X-Known: NO Date: Sun, 16 Dec 2007 23:54:28 -0500 To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.752.3) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-stable@freebsd.org Errors-To: owner-freebsd-stable@freebsd.org X-Virus-Scanned: ClamAV 0.88.1/5152/Mon Dec 17 00:37:06 2007 on mail.kalopa.net X-Virus-Scanned: ClamAV 0.88.1/5152/Mon Dec 17 00:37:06 2007 on mail.kalopa.net X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=6.0 tests=AWL,BAYES_00,SPF_SOFTFAIL autolearn=no version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on mail.kalopa.net Cc: freebsd-net@freebsd.org Subject: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 05:43:34 -0000 While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated November 8, 2007 it looks like I've stumbled across a broken driver or kernel routine which stops interrupt processing long enough to severly degrade network performance every 30.99 seconds. Packets appear to make it as far as ether_input() then get lost. Test setup: A - ethernet_switch - B A sends UDP packets to B through an ethernet switch. The interface input packet count and output packet count on the switch match what A is sending and B should be receiving. A UDP receiver running on B sees windows of packet loss with a period of 30.99 seconds. The lost packets are counted based on an incrementing sequence number. On an isolated network the Ipkts counter on B matches what A is sending, but the packets never show up in any of the IP/UDP counters or the program trying to receive them. This behavior can be seen with both em and fxp interfaces. Problem is it only occurs after the receiving host has been up about a day. Reboot, problem clears. GENERIC kernel, nothing more than default daemons running. Behavior seen on three different motherboards so far. It also appears this is not just lost network interrupts. Whatever is spinning in the kernel also impacts syscall latency. An easy way to replicate what I'm seeing is to run gettimeofday() in a tight loop and note when the real time syscall delay exceeds some value (which is dependent on processor speed). As an example on an 3.20GHz CPU a small program will output when the syscall latency is > 5000 usecs. Note the periodic behavior at 30.99 seconds. These big jumps in latency correspond to when packets are being dropped. usecs (epoch) latency diffoutput from packet loss tester) window_start/window_end is packet counter time_start/time_end is absolute time in usecs. window_diff is # of packets missing The test is run at about 15.5Kpps / 132Mbits/second, certainly a lot less than this hardware is capable of running BSD4.X. :missing window_start=311510, time_start=1197861726332008,window_end=311638, time_end=1197861726332011, window_diff=128, time_diff=3 :missing window_start=794482, time_start=1197861757331505,window_end=794609, time_end=1197861757331509, window_diff=127, time_diff=4 :missing window_start=1277313, time_start=1197861788331245,window_end=1277444, time_end=1197861788331249, window_diff=131, time_diff=4 :missing window_start=1760104, time_start=1197861819330625,window_end=1760232, time_end=1197861819330629, window_diff=128, time_diff=4 :missing window_start=2242789, time_start=1197861850330170,window_end=2242916, time_end=1197861850330174, window_diff=127, time_diff=4 :missing window_start=2725818, time_start=1197861881329712,window_end=2725946, time_end=1197861881329715, window_diff=128, time_diff=3 :missing window_start=3208594, time_start=1197861912329261,window_end=3208722, time_end=1197861912329264, window_diff=128, time_diff=3 :missing window_start=3691395, time_start=1197861943328802,window_end=3691522, time_end=1197861943328805, window_diff=127, time_diff=3 :missing window_start=4173793, time_start=1197861974328369,window_end=4173921, time_end=1197861974328373, window_diff=128, time_diff=4 :missing window_start=4656236, time_start=1197862005328176,window_end=4656367, time_end=1197862005328179, window_diff=131, time_diff=3 :missing window_start=5139197, time_start=1197862036327576,window_end=5139325, time_end=1197862036327580, window_diff=128, time_diff=4 :missing window_start=5621958, time_start=1197862067327208,window_end=5622085, time_end=1197862067327211, window_diff=127, time_diff=3 :missing window_start=6104597, time_start=1197862098326839,window_end=6104725, time_end=1197862098326843, window_diff=128, time_diff=4 :missing window_start=6587241, time_start=1197862129326514,window_end=6587369, time_end=1197862129326534, window_diff=128, time_diff=20 :missing window_start=7070051, time_start=1197862160326368,window_end=7070183, time_end=1197862160326371, window_diff=132, time_diff=3 :missing window_start=7552828, time_start=1197862191325873,window_end=7552954, time_end=1197862191325876, window_diff=126, time_diff=3 :missing window_start=8035434, time_start=1197862222325572,window_end=8035560, time_end=1197862222325576, window_diff=126, time_diff=4 I'm building a more up to date copy of RELENG_6 to make sure I'm not chasing something that's been fixed. As a side note this appears to also be happening on a RELENG_6 build dated Mar 11 2007. Included is the gettimeofday() looper. Run as ./a.out 1 5000, where 5000 will depend on your system speed. This probably won't provide any meaningful results on a loaded system. E-mail me off list for a copy of the packet tester or more diagnostics. #include #include #include #include #include #include #include #include main(int argc, char **argv) { struct timeval tv; struct timezone tz; u_int64_t time_now, time_last, time_mark; int quiet, max; if (argc != 3) errx(1, "Usage: %s ", argv[0]); quiet=atoi(argv[1]); max=atoi(argv[2]); gettimeofday(&tv, &tz); time_last = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t)tv.tv_usec; time_mark = 0LL; for (;;) { gettimeofday(&tv, &tz); time_now = (u_int64_t)tv.tv_sec * 1000000LL + (u_int64_t) tv.tv_usec; if (!quiet) { printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } else { if ((time_now-time_last) > max) { if (time_mark == 0) time_mark = time_now; printf("%llu %llu %llu\n", time_now, time_now-time_last, time_now-time_mark); time_mark = time_now; } } time_last = time_now; } } /* main */ _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 08:51:28 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8352B16A41B for ; Mon, 17 Dec 2007 08:51:28 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.235]) by mx1.freebsd.org (Postfix) with ESMTP id 50B5413C46E for ; Mon, 17 Dec 2007 08:51:28 +0000 (UTC) (envelope-from ivo.vachkov@gmail.com) Received: by wr-out-0506.google.com with SMTP id 68so1102144wra.13 for ; Mon, 17 Dec 2007 00:51:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=0R/TGxQqRPi9VZlcCnaWFPB0RFKIJCYbhXsFmOoYL7g=; b=MwOC+pp5HxcxIniBy8szANt1vsEAtsNj8EA54qiOWmeoWpaYuDU9XCr9ahdR9aQtFk17OFbh2eP5LjHo+AvfLQ5VXyLLrKmLpB209qGW38fbARw6tryExUXGrfx24wNyPFuakrhrdhCQOLMAfnB1M7bQSJBJGeaL5KXG0d+UHuQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=WpIZLfl1hYclP6EHWuhvhiADw/UTpGTNt0e7G58uDekLHRhTLaqxBazMjcoJyp0g8Knx8X7IYXfpdW2TvtbqV/bs066ZIHP/PziI5+m0EWMZ2g7/0d24E1oMkGzG8LGiTwGtBiasZmZC8sP7VJ7dA+6BJxVVGO5dLW6wjxF7DU8= Received: by 10.150.149.19 with SMTP id w19mr191900ybd.28.1197879795856; Mon, 17 Dec 2007 00:23:15 -0800 (PST) Received: by 10.150.204.13 with HTTP; Mon, 17 Dec 2007 00:23:15 -0800 (PST) Message-ID: Date: Mon, 17 Dec 2007 10:23:15 +0200 From: "Ivo Vachkov" To: "Niki Denev" In-Reply-To: <2e77fc10712131524v706cdec8y18288efe458745c9@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2e77fc10712131524v706cdec8y18288efe458745c9@mail.gmail.com> Cc: freebsd-net@freebsd.org Subject: Re: bridge and stp defaults X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 08:51:28 -0000 On Dec 14, 2007 1:24 AM, Niki Denev wrote: > Hi, > > Is there a reason that when adding member ports to a bridge stp is not > enabled by default on them? > Wouldn't it be more intuitive to be enabled by default these days? There are several reasons not to enable STP on a bridge port unless you're absolutely aware of what's happening here: http://unilans.net/phrack/61/p61-0x0c_Fun_with_Spanning_Tree_Protocol.txt > Regards, > Niki > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- "UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity." Dennis Ritchie From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 09:09:46 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 853B116A41B for ; Mon, 17 Dec 2007 09:09:46 +0000 (UTC) (envelope-from krishna.ramdass@gmail.com) Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.185]) by mx1.freebsd.org (Postfix) with ESMTP id 289DA13C4D3 for ; Mon, 17 Dec 2007 09:09:45 +0000 (UTC) (envelope-from krishna.ramdass@gmail.com) Received: by fk-out-0910.google.com with SMTP id b27so1872690fka.11 for ; Mon, 17 Dec 2007 01:09:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=w2FnIlAr47ZCB/gl2zS/EfwC6TMA5AAkp8zGtDfhjbE=; b=jQ9/2Xo4NNNp4f9nCqO7UILMjiF61HpAUh6i4mir8KKcNlt43sDoqGsEFQYX2Vx02Be+6z5f9KuVTaNs0KVSh01gvBuyrHbjZVeVE+fJXOlfwBwEbYH7f+JTDu2LLPXJ70ILr0OMVOaqRz+PLpxpgiquDPdafI+IfUilgx28zXM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=aWnLhF+dcX8/06S/9Iu7fl8e24EK/vlYWEIAC+Cj8ECkujPcFczLiLiLh+juJ2ElBLL95iq+CVBeY+lsfCDxoPbbCoZdsQRpwmk68+I3+9wcJ/OmUBGyW+Cdz5PeMbd2au3RvvkC7621s/kFl6ITvqjMQbHQeNv6H/FxJBfdhxI= Received: by 10.82.180.17 with SMTP id c17mr4905977buf.14.1197881005155; Mon, 17 Dec 2007 00:43:25 -0800 (PST) Received: by 10.82.145.3 with HTTP; Mon, 17 Dec 2007 00:43:25 -0800 (PST) Message-ID: <8c1eada80712170043w216b36b7gb5de6a149b952604@mail.gmail.com> Date: Mon, 17 Dec 2007 14:13:25 +0530 From: "Krishna Kumar" To: freebsd-drivers@freebsd.org, freebsd-net@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: WOL suport in Broadcom 5721 (57XX) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 09:09:46 -0000 Hi All, Just a check whether WOL is supported in the Broadcom drivers. Sorry in case this does not interest you. I was just checking whether we have WOL support in Broadcom drivers. I had a look at the current source and could not find the support. Is this in the list of todo's?? Can this feature not be supported due to design issues? Is somebody trying this out somewhere? Please do copy me on the reply as I am not subscribed to the list. -- Thanks and Best Regards, KK From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 10:11:18 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E26F16A41B; Mon, 17 Dec 2007 10:11:18 +0000 (UTC) (envelope-from mux@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 5D92613C46A; Mon, 17 Dec 2007 10:11:18 +0000 (UTC) (envelope-from mux@freebsd.org) Received: by elvis.mu.org (Postfix, from userid 1920) id D90231A4D7C; Mon, 17 Dec 2007 02:10:09 -0800 (PST) Date: Mon, 17 Dec 2007 11:10:09 +0100 From: Maxime Henrion To: Julian Elischer Message-ID: <20071217101009.GL71713@elvis.mu.org> References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="kfjH4zxOES6UT95V" Content-Disposition: inline In-Reply-To: <4762DD82.9070904@elischer.org> User-Agent: Mutt/1.4.2.3i Cc: Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 10:11:18 -0000 --kfjH4zxOES6UT95V Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Julian Elischer wrote: > Gleb Smirnoff wrote: > >On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: > >J> Maxime Henrion wrote: > >J> > Replying to myself on this one, sorry about that. > >J> > I said in my previous mail that I didn't know yet what process was > >J> > holding the lock of the rtentry that the routed process is dealing > >J> > with in rt_setgate(), and I just could verify that it is held by > >J> > the swi1: net thread. > >J> > So, in a nutshell: > >J> > - The routed process does its business on the routing socket, that > >ends up > >J> > calling rt_setgate(). While in rt_setgate() it drops the lock on > >its > >J> > rtentry in order to call rtalloc1(). At this point, the routed > >J> > process hold the gateway route (rtalloc1() returns it locked), and > >it > >J> > now tries to re-lock the original rtentry. > >J> > - At the same time, the swi net thread calls arpresolve() which ends > >up > >J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to > >J> > lock the gateway route. > >J> > A classical case of deadlock with mutexes because of different locking > >J> > order. Now, it's not obvious to me how to fix it :-). > >J> > >J> On failure to re-lock, the routed call to rt_setgate should completely > >abort J> and restart from scratch, releasing all locks it has on the way > >out. > > > >Do you suggest mtx_trylock? > > I think that would be the cleanest way.. So, here's what I've got. I have yet to test it at all, I hope that I'll be able to do so today, or tomorrow. Any input appreciated. Cheers, Maxime --kfjH4zxOES6UT95V Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="route-deadlock.patch" diff -Nru /sys/net/route.c net/route.c --- /sys/net/route.c Tue Oct 30 19:07:54 2007 +++ net/route.c Mon Dec 17 11:05:56 2007 @@ -996,6 +996,7 @@ struct radix_node_head *rnh = rt_tables[dst->sa_family]; int dlen = SA_SIZE(dst), glen = SA_SIZE(gate); +again: RT_LOCK_ASSERT(rt); /* @@ -1029,7 +1030,16 @@ RT_REMREF(rt); return (EADDRINUSE); /* failure */ } - RT_LOCK(rt); + /* + * Try to reacquire the lock on rt, and if it fails, + * clean state and restart from scratch. + */ + ok = RT_TRYLOCK(rt); + if (!ok) { + RTFREE_LOCKED(gwrt); + RT_LOCK(rt); + goto again; + } /* * If there is already a gwroute, then drop it. If we * are asked to replace route with itself, then do diff -Nru /sys/net/route.h net/route.h --- /sys/net/route.h Tue Apr 4 22:07:23 2006 +++ net/route.h Fri Dec 14 11:47:48 2007 @@ -289,6 +289,7 @@ #define RT_LOCK_INIT(_rt) \ mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK) #define RT_LOCK(_rt) mtx_lock(&(_rt)->rt_mtx) +#define RT_TRYLOCK(_rt) mtx_trylock(&(_rt)->rt_mtx) #define RT_UNLOCK(_rt) mtx_unlock(&(_rt)->rt_mtx) #define RT_LOCK_DESTROY(_rt) mtx_destroy(&(_rt)->rt_mtx) #define RT_LOCK_ASSERT(_rt) mtx_assert(&(_rt)->rt_mtx, MA_OWNED) --kfjH4zxOES6UT95V-- From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 10:38:08 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A79316A418 for ; Mon, 17 Dec 2007 10:38:08 +0000 (UTC) (envelope-from raffaele.delorenzo@libero.it) Received: from grupposervizi.it (mail1.tagetik.com [85.18.71.243]) by mx1.freebsd.org (Postfix) with SMTP id E2F0A13C4D3 for ; Mon, 17 Dec 2007 10:38:07 +0000 (UTC) (envelope-from raffaele.delorenzo@libero.it) Received: (qmail 3459 invoked by uid 453); 17 Dec 2007 10:11:24 -0000 Received: from [192.9.210.26] (HELO noel.grupposervizi.it) (192.9.210.26) by grupposervizi.it (qpsmtpd/0.31.1) with ESMTP; Mon, 17 Dec 2007 11:11:24 +0100 Message-ID: <47664B4B.4050805@libero.it> Date: Mon, 17 Dec 2007 11:11:23 +0100 From: Raffaele De Lorenzo User-Agent: Thunderbird 2.0.0.9 (X11/20071204) MIME-Version: 1.0 To: John E Hein References: <4759022A.4020105@libero.it> <47599AE1.6060805@elischer.org> <475D2185.3090405@libero.it> <868x4291ap.fsf@ds4.des.no> <475D417D.5020303@libero.it> <18273.25559.26231.178154@gromit.timing.com> In-Reply-To: <18273.25559.26231.178154@gromit.timing.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org, "raffaele.delorenzo" , net@freebsd.org, Julian Elischer , security@freebsd.org Subject: Re: Added native socks support to libc in FreeBSD 7 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 10:38:08 -0000 John E Hein wrote: > Raffaele De Lorenzo wrote at 14:39 +0100 on Dec 10, 2007: > > You can see in the port-tree my project "csocks" and > > http://csocks.altervista.org. > > Thanks for lettings us know about your project. Here are > just a few comments. > > Why don't you provide the source code in the port? > > For an open source, security sensitive project such as this, I think > that's important for users to gain confidence in it. > > > As far as putting the code in the base FreeBSD, that's a pretty large > hurdle. The FreeBSD maintainers tend to put something in base only > after a significant part of the user base uses it, and it has become > the [or a] de facto preferred implementation of some industry > standard. > > SOCKS is a standard, but the csocks implementation is not (yet). > Continue to adhere to RFCs and grow your user base, and perhaps > inclusion in FreeBSD's base system will happen organically. > > For things to go into the base system ... > > 1) The software (and its developers) need a proven track record > (which you can gain by getting a large user base in ports). > Personally, I hadn't heard about your SOCKS implementation until > this week. > > 2) A significant number of FreeBSD users can't do without it. Now, > this is quite subjective. In some sense, people can't do without > a web browser in this day and age, but there's no browser in the > FreeBSD base system. Of course, comparing firefox to csocks is > not fair. Maybe grep is a better comparison. Web browsers are > monstrous. > > 3) There is a significant benefit to having it tightly integrated > with the base system (as opposed to a more loose integration in > the ports tree). Wireless LAN is perhaps a good example here (and > for #2 for that matter). Not everyone needs it, but when you do > it is good to have it in the base system where it is given > system level architecture love and care. > > 4) You need someone with commit privs to shepherd this thing along > _and_ agreement from lots of other people (including FreeBSD's > core). Hint: the freebsd-arch list is often a good place to > discuss additions to the FreeBSD base. > > 5) Lots of other criteria (both implied and explicitly documented) > that I'll not go into further (everyone together: "Hear, Hear"). > > Note that the larger the base system becomes, the harder it is to > maintain it well as a core, well integrated body of work. And once it > is in the base, more people are now automatically signed on to > maintain it (indirectly)... not just you anymore. When someone makes > a change to the base tcp implementation, for instance, they have to > make sure it also doesn't break the shiny new socks code now in the > base system as well. This probably won't be a significant burden in > this particular case, but it's something that people have to consider. > > > As far as your specific patch to add socks support to libc ... > > Why not just make a patch that puts it in src/lib/libsocks? And a > binary in src/usr.bin/csocks (that does the LD_PRELOAD dance to > preload libsocks)? Why does it have to be in libc? > > I don't speak for the FreeBSD project, but that's a few of my thoughts > after looking at your implementation... which I did since it tickled > my curiosity. Keep up the good work. > . > > Hi, many tanks for your interested. Socks is a protocol used (for my experience) a lot in some banks for security reasons, so it has a large impact for the network security. Recently versions of IBM AIX OS introduced a native socks support. The IBM socks implementation is inside the AIX libc (AIX 4 has socks5 library in libc.a already), in fact, there are not externally socks libraries preloaded, and for socksify scope you must insert a socks rule in a particulary configuration file (default is "/etc/socks5c.conf"). The AIX native socks mode is very appreciated by the users, so my idea to add native socks support inside the libc in FreeBSD (that i think is a very good secure OS! ) is motivated by these considerations. This is a comparative table "AIX SOCKS" VS "CSOCKS": The IBM AIX Socks implementation: 1) doesn't support Socks V4 2) doesn't support GSS-API Authentication 3) Support IPv6 4) doesn't support Socks v5 User Authentication. 5) doesn't support Socks under UDP 6) Support sample Socks V5 connect and bind 7) The configuration file doesn't support detailed rules (you cannot specify the port an the protocol to socksify... for details you can see http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IBMp690/IBM/usr/share/man/info/en_US/a_doc_lib/files/aixfiles/socks5c.conf.htm) The CSOCKS Socks implementation: 1) Support Socks V4 Connect and Bind 2) Support Socks V5 Connect and Bind 3) Support Socks V5 Sample User Authentication method 4) Support Socks V5 Under UDP 5) The configuration file support detailed rules (you can see: http://csocks.altervista.org/doc.htm) 6) doesn't support IPv6 (under development) 7) doesn't support GSS-API Authentication (under development) The source code of "csocks/port version" is practically the same of the source code for the FreeBSD native support (the link is: http://csocks.altervista.org/download/FreeBSD_libc.tar.gz). Now i posted this discussion in FreeBSD arch mailing list (tanks for your advice). Raffaele From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 10:39:37 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D01816A476; Mon, 17 Dec 2007 10:39:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5BC3313C44B; Mon, 17 Dec 2007 10:39:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBHAdamq056533; Mon, 17 Dec 2007 02:39:36 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBHAdaju056532; Mon, 17 Dec 2007 02:39:36 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Mon, 17 Dec 2007 02:39:36 -0800 From: David G Lawrence To: Mark Fullmer Message-ID: <20071217103936.GR25053@tnn.dglawrence.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Mon, 17 Dec 2007 02:39:36 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 10:39:37 -0000 One more comment on my last email... The patch that I included is not meant as a real fix - it is just a bandaid. The real problem appears to be that a very large number of vnodes (all of them?) are getting synced (i.e. calling ffs_syncvnode()) every time. This should normally only happen for dirty vnodes. I suspect that something is broken with this check: if (vp->v_type == VNON || ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && vp->v_bufobj.bo_dirty.bv_cnt == 0)) { VI_UNLOCK(vp); continue; } ...like the i_flag flags aren't ever getting properly cleared (or bv_cnt is always non-zero). ...but I don't have the time to chase this down. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 10:59:19 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C50E916A420; Mon, 17 Dec 2007 10:59:19 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id B390513C45A; Mon, 17 Dec 2007 10:59:19 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBHAOY1Y046966; Mon, 17 Dec 2007 02:24:34 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBHAOX2L046965; Mon, 17 Dec 2007 02:24:33 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Mon, 17 Dec 2007 02:24:33 -0800 From: David G Lawrence To: Mark Fullmer Message-ID: <20071217102433.GQ25053@tnn.dglawrence.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Mon, 17 Dec 2007 02:24:34 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 10:59:19 -0000 > While trying to diagnose a packet loss problem in a RELENG_6 snapshot > dated > November 8, 2007 it looks like I've stumbled across a broken driver or > kernel routine which stops interrupt processing long enough to severly > degrade network performance every 30.99 seconds. I noticed this as well some time ago. The problem has to do with the processing (syncing) of vnodes. When the total number of allocated vnodes in the system grows to tens of thousands, the ~31 second periodic sync process takes a long time to run. Try this patch and let people know if it helps your problem. It will periodically wait for one tick (1ms) every 500 vnodes of processing, which will allow other things to run. Index: ufs/ffs/ffs_vfsops.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.290.2.16 diff -c -r1.290.2.16 ffs_vfsops.c *** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16 --- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000 *************** *** 1109,1114 **** --- 1109,1115 ---- int softdep_deps; int softdep_accdeps; struct bufobj *bo; + int flushed_count = 0; fs = ump->um_fs; if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */ *************** *** 1174,1179 **** --- 1175,1184 ---- allerror = error; vput(vp); MNT_ILOCK(mp); + if (flushed_count++ > 500) { + flushed_count = 0; + msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1); + } } MNT_IUNLOCK(mp); /* -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 11:07:01 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA71716A417 for ; Mon, 17 Dec 2007 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id ED84B13C478 for ; Mon, 17 Dec 2007 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBHB71OC088295 for ; Mon, 17 Dec 2007 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBHB71dx088291 for freebsd-net@FreeBSD.org; Mon, 17 Dec 2007 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 17 Dec 2007 11:07:01 GMT Message-Id: <200712171107.lBHB71dx088291@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 11:07:02 -0000 Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- f kern/115360 net [ipv6] IPv6 address and if_bridge don't play well toge 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- a kern/38554 net changing interface ipaddress doesn't seem to work s kern/39937 net ipstealth issue f kern/62374 net panic: free: multiple frees s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/92552 net A serious bug in most network drivers from 5.X to 6.X s kern/95665 net [if_tun] "ping: sendto: No buffer space available" wit s kern/105943 net Network stack may modify read-only mbuf chain copies o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/108542 net [bce]: Huge network latencies with 6.2-RELEASE / STABL o kern/110959 net [ipsec] Filtering incoming packets with enc0 does not o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112722 net IP v4 udp fragmented packet reject o kern/113457 net [ipv6] deadlock occurs if a tunnel goes down while the o kern/113842 net [ipv6] PF_INET6 proto domain state can't be cleared wi o kern/114714 net [gre][patch] gre(4) is not MPSAFE and does not support o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/116077 net 6.2-STABLE panic during use of multi-cast networking c o kern/116172 net Network / ipv6 recursive mutex panic o kern/116185 net if_iwi driver leads system to reboot o kern/116328 net [bge]: Solid hang with bge interface o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o kern/116837 net ifconfig tunX destroy: panic o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117423 net Duplicate IP on different interfaces o bin/117448 net [carp] 6.2 kernel crash o kern/117717 net [panic] Kernel panic with Bittorrent client. 28 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o conf/23063 net [PATCH] for static ARP tables in rc.network s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr o kern/54383 net [nfs] [patch] NFS root configurations without dynamic s kern/60293 net FreeBSD arp poison patch o kern/95267 net packet drops periodically appear f kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/102035 net [plip] plip networking disables parallel port printing o conf/102502 net [patch] ifconfig name does't rename netgraph node in n o conf/107035 net [patch] bridge interface given in rc.conf not taking a o kern/112654 net [pcn] Kernel panic upon if_pcn module load on a Netfin o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f o bin/116643 net [patch] fstat(1): add INET/INET6 socket details as in o bin/117339 net [patch] route(8): loading routing management commands o kern/118722 net [tcp] Many old TCP connections in SYN_RCVD state o kern/118727 net [ng] [patch] add new ng_pf module 16 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 13:16:55 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB22F16A419; Mon, 17 Dec 2007 13:16:55 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from smtp6.yandex.ru (smtp6.yandex.ru [213.180.200.197]) by mx1.freebsd.org (Postfix) with ESMTP id EB24013C442; Mon, 17 Dec 2007 13:16:54 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from ns.kirov.so-cdu.ru ([77.72.136.145]:39674 "EHLO [127.0.0.1]" smtp-auth: "bu7cher" TLS-CIPHER: "DHE-RSA-AES256-SHA keybits 256/256 version TLSv1/SSLv3" TLS-PEER-CN1: ) by mail.yandex.ru with ESMTP id S5473548AbXLQNDg (ORCPT + 1 other); Mon, 17 Dec 2007 16:03:36 +0300 X-Yandex-Spam: 1 X-Yandex-Front: smtp6 X-Yandex-TimeMark: 1197896616 X-MsgDayCount: 4 X-Comment: RFC 2476 MSA function at smtp6.yandex.ru logged sender identity as: bu7cher Message-ID: <476673A6.8050306@yandex.ru> Date: Mon, 17 Dec 2007 16:03:34 +0300 From: "Andrey V. Elsukov" User-Agent: Mozilla Thunderbird 1.5 (FreeBSD/20051231) MIME-Version: 1.0 To: Krishna Kumar References: <8c1eada80712170043w216b36b7gb5de6a149b952604@mail.gmail.com> In-Reply-To: <8c1eada80712170043w216b36b7gb5de6a149b952604@mail.gmail.com> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, freebsd-drivers@freebsd.org Subject: Re: WOL suport in Broadcom 5721 (57XX) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 13:16:56 -0000 Krishna Kumar wrote: > Is this in the list of todo's?? > Can this feature not be supported due to design issues? > Is somebody trying this out somewhere? > Please do copy me on the reply as I am not subscribed to the list. Look into freebsd-hackers@ mail archive. In the previous month there was a discussion about WOL support. Look to topics: 1. FreeBSD WOL sis on 2. How to add wake on lan support for your card And as i remember, Sam Leffer has made some work for WOL support. -- WBR, Andrey V. Elsukov From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 13:36:00 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1130216A417 for ; Mon, 17 Dec 2007 13:36:00 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.191]) by mx1.freebsd.org (Postfix) with ESMTP id ED0C613C4E8 for ; Mon, 17 Dec 2007 13:35:59 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: by rv-out-0910.google.com with SMTP id l15so2097614rvb.43 for ; Mon, 17 Dec 2007 05:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=QbbR7rg6lgJDSGuoht0izsL8zsp0ybbNwS0LNjXPkeU=; b=wpCwGUPfuqw0+6wU2d57UX+kM2CqjeetI3CDx0SU4LnVAarpKVYHK52liCgqcMx/upkK6jh3dMumLJ6bmNxwGv/ZR7no5yP+osg/WwcrEVxat3wfGbztc2Wtn4Atg3thYsoUMbpI5g0UWVf8QzVKkuGUivQZzvgqC99q3j5WKSg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=sw4yXuAO9bBTPd0bB3OHHV7PCIFYxSSLW4fJA3Cjv4X1MWkq1JznocWEVOFQHPMSqroxqyWiNRnKqlvvnuobbX37KomF2Dj6aQIAR3M6MC9o7xotLAZZH5EYo5tOLlqiDgxoDFXtdMUIpFYSwnl6uaCQvOELqGA3EGiaF2IEq/Q= Received: by 10.141.172.6 with SMTP id z6mr3975059rvo.80.1197898558851; Mon, 17 Dec 2007 05:35:58 -0800 (PST) Received: by 10.141.170.18 with HTTP; Mon, 17 Dec 2007 05:35:58 -0800 (PST) Message-ID: <2e77fc10712170535l448b097em7271127baf039588@mail.gmail.com> Date: Mon, 17 Dec 2007 08:35:58 -0500 From: "Niki Denev" Sender: ndenev@gmail.com To: "Ivo Vachkov" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <2e77fc10712131524v706cdec8y18288efe458745c9@mail.gmail.com> X-Google-Sender-Auth: 761b943057bed76d Cc: freebsd-net@freebsd.org Subject: Re: bridge and stp defaults X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 13:36:00 -0000 On Dec 17, 2007 3:23 AM, Ivo Vachkov wrote: > > On Dec 14, 2007 1:24 AM, Niki Denev wrote: > > Hi, > > > > Is there a reason that when adding member ports to a bridge stp is not > > enabled by default on them? > > Wouldn't it be more intuitive to be enabled by default these days? > > There are several reasons not to enable STP on a bridge port unless > you're absolutely aware of what's happening here: > > http://unilans.net/phrack/61/p61-0x0c_Fun_with_Spanning_Tree_Protocol.txt > > > Regards, > > Niki > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > > > > -- > "UNIX is basically a simple operating system, but you have to be a > genius to understand the simplicity." Dennis Ritchie > > I was asking this question because all of the ethernet switches that i have worked with (Cisco/3Com) have R/STP enabled by default (if they support it of course). From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 13:44:07 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 645E616A477 for ; Mon, 17 Dec 2007 13:44:07 +0000 (UTC) (envelope-from zhouzhouyi@FreeBSD.org) Received: from ercist.iscas.ac.cn (ercist.iscas.ac.cn [124.16.138.3]) by mx1.freebsd.org (Postfix) with SMTP id 5EC6E13C45B for ; Mon, 17 Dec 2007 13:44:06 +0000 (UTC) (envelope-from zhouzhouyi@FreeBSD.org) Received: (qmail 53165 invoked by uid 98); 17 Dec 2007 13:15:40 -0000 Received: from 210.77.2.97 by ercist.iscas.ac.cn (envelope-from , uid 89) with qmail-scanner-1.25 (spamassassin: 3.1.0. Clear:RC:1(210.77.2.97):SA:0(-0.7/8.0):. Processed in 2.996309 secs); 17 Dec 2007 13:15:40 -0000 X-Spam-Status: No, hits=-0.7 required=8.0 X-Qmail-Scanner-Mail-From: zhouzhouyi@FreeBSD.org via ercist.iscas.ac.cn X-Qmail-Scanner: 1.25 (Clear:RC:1(210.77.2.97):SA:0(-0.7/8.0):. Processed in 2.996309 secs) Received: from unknown (HELO zzy.H.qngy.gscas) (zhouzhouyi@ercist.iscas.ac.cn@210.77.2.97) by 0 with SMTP; 17 Dec 2007 13:15:37 -0000 Date: Mon, 17 Dec 2007 21:15:45 +0800 From: zhouyi zhou To: freebsd-net@freebsd.org Message-Id: <20071217211545.3a28981a.zhouzhouyi@FreeBSD.org> Organization: Institute of Software X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i386-portbld-freebsd5.4) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: using netgraph to create a pair of pseudo ethernet interface X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 13:44:07 -0000 Dear All, Any one know how to us netgraph to create a pair of pseudo ethernet interface. The packet go out from one and in another. Thanks alot Zhouyi Zhou From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 17:57:21 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD07A16A417 for ; Mon, 17 Dec 2007 17:57:21 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 8E6DD13C45B for ; Mon, 17 Dec 2007 17:57:21 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 54024 invoked from network); 17 Dec 2007 17:57:20 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 17 Dec 2007 17:57:20 -0000 In-Reply-To: <20071217054305.GA18268@eos.sc1.parodius.com> References: <20071217054305.GA18268@eos.sc1.parodius.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Mon, 17 Dec 2007 12:57:05 -0500 To: Jeremy Chadwick X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 17:57:21 -0000 Back to back test with no ethernet switch between two em interfaces, same result. The receiving side has been up > 1 day and exhibits the problem. These are also two different servers. The small gettimeofday() syscall tester also shows the same ~30 second pattern of high latency between syscalls. Receiver test application reports 3699 missed packets Sender netstat -i: (before test) em1 1500 00:04:23:cf:51:b7 20 0 15975785 0 0 em1 1500 10.1/24 10.1.0.2 37 - 15975801 - - (after test) em1 1500 00:04:23:cf:51:b7 22 0 25975822 0 0 em1 1500 10.1/24 10.1.0.2 39 - 25975838 - - total IP packets sent in during test = end - start 25975838-15975801 = 10000037 (expected, 1,000,000 packets test + overhead) Receiver netstat -i: (before test) em1 1500 00:04:23:c4:cc:89 15975785 0 21 0 0 em1 1500 10.1/24 10.1.0.1 15969626 - 19 - - (after test) em1 1500 00:04:23:c4:cc:89 25975822 0 23 0 0 em1 1500 10.1/24 10.1.0.1 25965964 - 21 - - total ethernet frames received during test = end - start 25975822-15975785 = 10000037 (as expected) total IP packets processed during test = end - start 25965964-15969626 = 9996338 (expecting 10000037) Missed packets = expected - received 10000037-9996338 = 3699 netstat -i accounts for the 3699 missed packets also reported by the application Looking closer at the tester output again shows the periodic ~30 second windows of packet loss. There's a second problem here in that packets are just disappearing before they make it to ip_input(), or there's a dropped packets counter I've not found yet. I can provide remote access to anyone who wants to take a look, this is very easy to duplicate. The ~ 1 day uptime before the behavior surfaces is not making this easy to isolate. -- mark On Dec 17, 2007, at 12:43 AM, Jeremy Chadwick wrote: > On Mon, Dec 17, 2007 at 12:21:43AM -0500, Mark Fullmer wrote: >> While trying to diagnose a packet loss problem in a RELENG_6 >> snapshot dated >> November 8, 2007 it looks like I've stumbled across a broken >> driver or >> kernel routine which stops interrupt processing long enough to >> severly >> degrade network performance every 30.99 seconds. >> >> Packets appear to make it as far as ether_input() then get lost. > > Are you sure this isn't being caused by something the switch is doing, > such as MAC/ARP cache clearing or LACP? I'm just speculating, but it > would be worthwhile to remove the switch from the picture (crossover > cable to the rescue). > > I know that at least in the case of fxp(4) and em(4), Jack Vogel does > some through testing of throughput using a professional/high-end > packet > generator (some piece of hardware, I forget the name...) > > -- > | Jeremy Chadwick jdc at > parodius.com | > | Parodius Networking http:// > www.parodius.com/ | > | UNIX Systems Administrator Mountain View, > CA, USA | > | Making life hard for others since 1977. PGP: > 4BD6C0CB | > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 18:03:41 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D7DD16A417 for ; Mon, 17 Dec 2007 18:03:41 +0000 (UTC) (envelope-from zec@icir.org) Received: from xaqua.tel.fer.hr (xaqua.tel.fer.hr [161.53.19.25]) by mx1.freebsd.org (Postfix) with ESMTP id 27E5813C508 for ; Mon, 17 Dec 2007 18:03:41 +0000 (UTC) (envelope-from zec@icir.org) Received: by xaqua.tel.fer.hr (Postfix, from userid 20006) id 525A29B64D; Mon, 17 Dec 2007 18:32:47 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on xaqua.tel.fer.hr X-Spam-Level: X-Spam-Status: No, score=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.7 Received: from [192.168.200.112] (unknown [161.53.19.16]) by xaqua.tel.fer.hr (Postfix) with ESMTP id B133D9B648 for ; Mon, 17 Dec 2007 18:32:46 +0100 (CET) From: Marko Zec To: freebsd-net@freebsd.org Date: Mon, 17 Dec 2007 18:32:41 +0100 User-Agent: KMail/1.9.7 References: <20071217211545.3a28981a.zhouzhouyi@FreeBSD.org> In-Reply-To: <20071217211545.3a28981a.zhouzhouyi@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200712171832.41370.zec@icir.org> Subject: Re: using netgraph to create a pair of pseudo ethernet interface X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 18:03:41 -0000 On Monday 17 December 2007 14:15:45 zhouyi zhou wrote: > Dear All, > Any one know how to us netgraph to create a pair of pseudo > ethernet interface. The packet go out from one and in another. tpx32# ngctl mkpeer eiface ether ether tpx32# ngctl mkpeer eiface ether ether tpx32# ngctl l There are 3 total nodes: Name: ngctl1446 Type: socket ID: 00000006 Num hooks: 0 Name: ngeth1 Type: eiface ID: 00000005 Num hooks: 0 Name: ngeth0 Type: eiface ID: 00000003 Num hooks: 0 tpx32# ngctl connect ngeth0: ngeth1: ether ether tpx32# ngctl l There are 3 total nodes: Name: ngctl1448 Type: socket ID: 00000008 Num hooks: 0 Name: ngeth1 Type: eiface ID: 00000005 Num hooks: 1 Name: ngeth0 Type: eiface ID: 00000003 Num hooks: 1 tpx32# From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 18:27:02 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10E3E16A417 for ; Mon, 17 Dec 2007 18:27:02 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outX.internet-mail-service.net (outX.internet-mail-service.net [216.240.47.247]) by mx1.freebsd.org (Postfix) with ESMTP id F2DA113C478 for ; Mon, 17 Dec 2007 18:27:01 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Mon, 17 Dec 2007 10:26:57 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 470A3126CE5; Mon, 17 Dec 2007 10:26:57 -0800 (PST) Message-ID: <4766BF72.7000005@elischer.org> Date: Mon, 17 Dec 2007 10:26:58 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Maxime Henrion References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> In-Reply-To: <20071217101009.GL71713@elvis.mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 18:27:02 -0000 Maxime Henrion wrote: > Julian Elischer wrote: >> Gleb Smirnoff wrote: >>> On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: >>> J> Maxime Henrion wrote: >>> J> > Replying to myself on this one, sorry about that. >>> J> > I said in my previous mail that I didn't know yet what process was >>> J> > holding the lock of the rtentry that the routed process is dealing >>> J> > with in rt_setgate(), and I just could verify that it is held by >>> J> > the swi1: net thread. >>> J> > So, in a nutshell: >>> J> > - The routed process does its business on the routing socket, that >>> ends up >>> J> > calling rt_setgate(). While in rt_setgate() it drops the lock on >>> its >>> J> > rtentry in order to call rtalloc1(). At this point, the routed >>> J> > process hold the gateway route (rtalloc1() returns it locked), and >>> it >>> J> > now tries to re-lock the original rtentry. >>> J> > - At the same time, the swi net thread calls arpresolve() which ends >>> up >>> J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to >>> J> > lock the gateway route. >>> J> > A classical case of deadlock with mutexes because of different locking >>> J> > order. Now, it's not obvious to me how to fix it :-). >>> J> >>> J> On failure to re-lock, the routed call to rt_setgate should completely >>> abort J> and restart from scratch, releasing all locks it has on the way >>> out. >>> >>> Do you suggest mtx_trylock? >> I think that would be the cleanest way.. > > So, here's what I've got. I have yet to test it at all, I hope that > I'll be able to do so today, or tomorrow. Any input appreciated. > > Cheers, > Maxime > this code is I think (from memory) called only from the user right? it is possible that on failure to lock one might delay for 1 tick or something.. (I don't have the code in front of me right now) otherwise I think that might do the job.. more comments later. From owner-freebsd-net@FreeBSD.ORG Mon Dec 17 19:10:21 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 311DB16A420 for ; Mon, 17 Dec 2007 19:10:21 +0000 (UTC) (envelope-from mav@mavhome.dp.ua) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id B766913C4D1 for ; Mon, 17 Dec 2007 19:10:19 +0000 (UTC) (envelope-from mav@mavhome.dp.ua) X-Spam-Flag: SKIP X-Spam-Yversion: Spamooborona 1.7.0 Received: from [212.86.226.226] (account mav@alkar.net HELO [192.168.3.2]) by cmail.optima.ua (CommuniGate Pro SMTP 5.1.10) with ESMTPA id 56533750; Mon, 17 Dec 2007 21:10:19 +0200 Message-ID: <4766C996.308@mavhome.dp.ua> Date: Mon, 17 Dec 2007 21:10:14 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: zhouyi zhou References: <1197912191.00843848.1197899401@10.7.7.3> In-Reply-To: <1197912191.00843848.1197899401@10.7.7.3> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: using netgraph to create a pair of pseudo ethernet interface X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2007 19:10:21 -0000 zhouyi zhou wrote: > Any one know how to us netgraph to create a pair of pseudo ethernet interface. The packet > go out from one and in another. man 4 ng_eiface , man 4 netgraph ? -- Alexander Motin From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 03:37:42 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 819D516A418 for ; Tue, 18 Dec 2007 03:37:42 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 417FA13C45A for ; Tue, 18 Dec 2007 03:37:42 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 42358 invoked from network); 18 Dec 2007 03:37:41 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 18 Dec 2007 03:37:41 -0000 In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Mon, 17 Dec 2007 22:37:25 -0500 To: David G Lawrence X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 03:37:42 -0000 Thanks. Have a kernel building now. It takes about a day of uptime after reboot before I'll see the problem. -- mark On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote: >> While trying to diagnose a packet loss problem in a RELENG_6 snapshot >> dated >> November 8, 2007 it looks like I've stumbled across a broken >> driver or >> kernel routine which stops interrupt processing long enough to >> severly >> degrade network performance every 30.99 seconds. > > I noticed this as well some time ago. The problem has to do with > the > processing (syncing) of vnodes. When the total number of allocated > vnodes > in the system grows to tens of thousands, the ~31 second periodic sync > process takes a long time to run. Try this patch and let people > know if > it helps your problem. It will periodically wait for one tick (1ms) > every > 500 vnodes of processing, which will allow other things to run. > > Index: ufs/ffs/ffs_vfsops.c > =================================================================== > RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v > retrieving revision 1.290.2.16 > diff -c -r1.290.2.16 ffs_vfsops.c > *** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16 > --- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000 > *************** > *** 1109,1114 **** > --- 1109,1115 ---- > int softdep_deps; > int softdep_accdeps; > struct bufobj *bo; > + int flushed_count = 0; > > fs = ump->um_fs; > if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */ > *************** > *** 1174,1179 **** > --- 1175,1184 ---- > allerror = error; > vput(vp); > MNT_ILOCK(mp); > + if (flushed_count++ > 500) { > + flushed_count = 0; > + msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1); > + } > } > MNT_IUNLOCK(mp); > /* > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 05:43:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E25916A418 for ; Tue, 18 Dec 2007 05:43:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 2852013C457 for ; Tue, 18 Dec 2007 05:43:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBI5heff027162 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 16:43:43 +1100 Date: Tue, 18 Dec 2007 16:43:40 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com> Message-ID: <20071218155642.D32807@delplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 05:43:52 -0000 On Mon, 17 Dec 2007, David G Lawrence wrote: >> While trying to diagnose a packet loss problem in a RELENG_6 snapshot >> dated >> November 8, 2007 it looks like I've stumbled across a broken driver or >> kernel routine which stops interrupt processing long enough to severly >> degrade network performance every 30.99 seconds. I see the same behaviour under a heavily modified version of FreeBSD-5.2 (except the period was 2 ms longer and the latency was 7 ms instead of 11 ms when numvnodes was at a certain value. Now with numvnodes = 17500, the latency is 3 ms. > I noticed this as well some time ago. The problem has to do with the > processing (syncing) of vnodes. When the total number of allocated vnodes > in the system grows to tens of thousands, the ~31 second periodic sync > process takes a long time to run. Try this patch and let people know if > it helps your problem. It will periodically wait for one tick (1ms) every > 500 vnodes of processing, which will allow other things to run. However, the syncer should be running at a relative low priority and not cause packet loss. I don't see any packet loss even in ~5.2 where the network stack (but not drivers) is still Giant-locked. Other too-high latencies showed up: - syscons LED setting and vt switching gives a latency of 5.5 msec because syscons still uses busy-waiting for setting LEDs :-(. Oops, I do see packet loss -- this causes it under ~5.2 but not under -current. For the bge and/or em drivers, the packet loss shows up in netstat output as a few hundred errors for every LED setting on the receiving machine, while receiving tiny packets at the maximum possible rate of 640 kpps. sysctl is completely Giant-locked and so are upper layers of the network stack. The bge hardware rx ring size is 256 in -current and 512 in ~5.2. At 640 kpps, 512 packets take 800 us so bge wants to call the the upper layers with a latency of far below 800 us. I don't know exactly where the upper layers block on Giant. - a user CPU hog process gives a latency of over 200 ms every half a second or so when the hog starts up, and a 300-400 ms after the hog has been running for some time. Two user CPU hog processes double the latency. Reducing kern.sched.quantum from 100 ms to 10 ms and/or renicing the hogs don't seem to affect this. Running the hogs at idle priority fixes this. This won't affect packet loss, but it might affect user network processes -- they might need to run at real time priority to get low enough latency. They might need to do this anyway -- a scheduling quantum of 100 ms should give a latency of 100 ms per CPU hog quite often, though not usually since the hogs should never be prefered to a higher-prioerity process. Previously I've used a less specialized clock-watching program to determine the syscall latency. It showed similar problems for CPU hogs. I just remembered that I found the fix for these under ~5.2 -- remove a local hack that sacrifices latency for reduced context switches between user threads. -current with SCHED_4BSD does this non-hackishly, but seems to have a bug somehwhere that gives a latency that is large enough to be noticeable in interactive programs. Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 06:20:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27E8916A41A; Tue, 18 Dec 2007 06:20:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id AAEE513C47E; Tue, 18 Dec 2007 06:20:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBI6KUvC013824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 17:20:31 +1100 Date: Tue, 18 Dec 2007 17:20:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071217103936.GR25053@tnn.dglawrence.com> Message-ID: <20071218170133.X32807@delplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 06:20:38 -0000 On Mon, 17 Dec 2007, David G Lawrence wrote: > One more comment on my last email... The patch that I included is not > meant as a real fix - it is just a bandaid. The real problem appears to > be that a very large number of vnodes (all of them?) are getting synced > (i.e. calling ffs_syncvnode()) every time. This should normally only > happen for dirty vnodes. I suspect that something is broken with this > check: > > if (vp->v_type == VNON || ((ip->i_flag & > (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && > vp->v_bufobj.bo_dirty.bv_cnt == 0)) { > VI_UNLOCK(vp); > continue; > } Isn't it just the O(N) algorithm with N quite large? Under ~5.2, on a 2.2GHz A64 UP in 32-bit mode, I see a latency of 3 ms for 17500 vnodes, which would be explained by the above (and the VI_LOCK() and loop overhead) taking 171 ns per vnode. I would expect it to take more like 20 ns per vnode for UP and 60 for SMP. The comment before this code shows that the problem is known, and says that a subroutine call cannot be afforded unless there is work to do, but the, the locking accesses look like subroutine calls, have subroutine calls in their internals, and take longer than simple subroutine calls in the SMP case even when they don't make subroutine calls. (IIRC, on A64 a minimal subroutine call takes 4 cycles while a minimal locked instructions takes 18 cycles; subroutine calls are only slow when their branches are mispredicted.) Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 07:18:23 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3208C16A419 for ; Tue, 18 Dec 2007 07:18:23 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id C3D2213C4D9 for ; Tue, 18 Dec 2007 07:18:22 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id lBI6slA5077050; Mon, 17 Dec 2007 23:54:48 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <47676E96.4030708@samsco.org> Date: Mon, 17 Dec 2007 23:54:14 -0700 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4 MIME-Version: 1.0 To: Bruce Evans References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> In-Reply-To: <20071218170133.X32807@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Mon, 17 Dec 2007 23:54:48 -0700 (MST) X-Spam-Status: No, score=-1.4 required=5.4 tests=ALL_TRUSTED autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 07:18:23 -0000 Bruce Evans wrote: > On Mon, 17 Dec 2007, David G Lawrence wrote: > >> One more comment on my last email... The patch that I included is not >> meant as a real fix - it is just a bandaid. The real problem appears to >> be that a very large number of vnodes (all of them?) are getting synced >> (i.e. calling ffs_syncvnode()) every time. This should normally only >> happen for dirty vnodes. I suspect that something is broken with this >> check: >> >> if (vp->v_type == VNON || ((ip->i_flag & >> (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && >> vp->v_bufobj.bo_dirty.bv_cnt == 0)) { >> VI_UNLOCK(vp); >> continue; >> } > > Isn't it just the O(N) algorithm with N quite large? Under ~5.2, on > a 2.2GHz A64 UP in 32-bit mode, I see a latency of 3 ms for 17500 vnodes, > which would be explained by the above (and the VI_LOCK() and loop > overhead) taking 171 ns per vnode. I would expect it to take more like > 20 ns per vnode for UP and 60 for SMP. > > The comment before this code shows that the problem is known, and says > that a subroutine call cannot be afforded unless there is work to do, > but the, the locking accesses look like subroutine calls, have subroutine > calls in their internals, and take longer than simple subroutine calls > in the SMP case even when they don't make subroutine calls. (IIRC, on > A64 a minimal subroutine call takes 4 cycles while a minimal locked > instructions takes 18 cycles; subroutine calls are only slow when their > branches are mispredicted.) > > Bruce Right, it's a non-optimal loop when N is very large, and that's a fairly well understood problem. I think what DG was getting at, though, is that this massive flush happens every time the syncer runs, which doesn't seem correct. Sure, maybe you just rsynced 100,000 files 20 seconds ago, so the upcoming flush is going to be expensive. But the next flush 30 seconds after that shouldn't be just as expensive, yet it appears to be so. This is further supported by the original poster's claim that it takes many hours of uptime before the problem becomes noticeable. If vnodes are never truly getting cleaned, or never getting their flags cleared so that this loop knows that they are clean, then it's feasible that they'll accumulate over time, keep on getting flushed every 30 seconds, keep on bogging down the loop, and so on. Scott From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 09:51:30 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E231416A475; Tue, 18 Dec 2007 09:51:30 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from mx4.mail.ru (fallback.mail.ru [194.67.57.14]) by mx1.freebsd.org (Postfix) with ESMTP id D09A213C4EC; Tue, 18 Dec 2007 09:51:29 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from mx30.mail.ru (mx30.mail.ru [194.67.23.238]) by mx4.mail.ru (mPOP.Fallback_MX) with ESMTP id 9A14A10B366A; Tue, 18 Dec 2007 12:20:53 +0300 (MSK) Received: from [78.140.2.237] (port=26518 helo=nuclight.avtf.net) by mx30.mail.ru with esmtp id 1J4Yd9-000FkY-00; Tue, 18 Dec 2007 12:20:51 +0300 Date: Tue, 18 Dec 2007 15:20:48 +0600 To: "freebsd-ipfw@freebsd.org" , "freebsd-current@freebsd.org" , "freebsd-net@freebsd.org" , "freebsd-stable@freebsd.org" From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: multipart/mixed; boundary=----------2UNlkgEOncIwckbiXYzy0f MIME-Version: 1.0 Message-ID: User-Agent: Opera M2/7.54 (Win32, build 3865) Cc: maxim@freebsd.org, mlaier@freebsd.org, phk@freebsd.org Subject: [PATCH] ipfwpcap(8) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 09:51:31 -0000 ------------2UNlkgEOncIwckbiXYzy0f Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r Content-Transfer-Encoding: 8bit Hi, I've recently found a patch (also available at http://antigreen.org/vadim/freebsd/ipfwpcap/) made by me and my friend in January to ipfwpcap(8) introduced in 7.0. Now it have more features, some of which were already present in pflogd(8) counterpart. Patched version were tested in about to 200 parallel processes, on both 5.5 and 6.2 for half a year, without any bugs. If possible, could it be committed to ongoing 7.0-RELEASE ? It would be nice to not break POLA after release is being stable and widely available (some option meaning were changed (to be more consistent with pflogd and overall FreeBSD-ish, though), but I forgot to post it earlier, before 7.0-STABLE fork, sorry). Please. List of changes: 1. Program now daemonizes itself by defaul, and -d option not only enables debug, but cancels daemonizing too. 2. Log is now re-opened on SIGHUP; if log pathname was not absolute, will not do chdir("/") after daemonizing. 3. Log is now flushed on SIGALRM, new option -i can be used to specify flush interval (using alarm(3)), default is 60 seconds. 4. Added option -z, which resets log-limiting counters to zero on each log re-open. 5. Added pid-file checking - if exists, check if process with it's value still exists (ignore signal 0 ourselves), if not, rewrite stale pid-file and begin working. 6. Signal handlers now do only variable setting, all work is done in main loop, changed from for(;;) to while(!quit). 7. Minor changes - less global variables, changed strcpy() -> strlcpy(), added some macros, less output from usage (as we now have manpage), most exit codes changed from custom ones to sysexits(3). 8. More style(9), and new features are documented in man page, some old statements in man were made more detailed. -- WBR, Vadim Goncharov ------------2UNlkgEOncIwckbiXYzy0f Content-Disposition: attachment; filename=ipfwpcap.patch Content-Type: application/octet-stream; name=ipfwpcap.patch Content-Transfer-Encoding: 8bit --- ipfwpcap.c.orig Sat Nov 11 05:22:06 2006 +++ ipfwpcap.c Tue Dec 19 08:30:00 2006 @@ -1,95 +1,120 @@ +/*- + * Copyright (c) 2004 University of Toronto. All rights reserved. + * Anyone may use or copy this software except that this copyright + * notice remain intact and that credit is given where it is due. + * The University of Toronto and the author make no warranty and + * accept no liability for this software. + * + * $FreeBSD: /repoman/r/ncvs/src/usr.sbin/ipfwpcap/ipfwpcap.c,v 1.2 2006/09/04 19:30:44 sam Exp $ + */ + /* - * copy diverted (or tee'd) packets to a file in 'tcpdump' format + * Copy diverted (or tee'd) packets to a file in 'tcpdump' format * (ie. this uses the '-lpcap' routines). * - * example usage: - * # ipfwpcap -r 8091 divt.log & + * Example usage: + * # ipfwpcap -r 8091 divt.log * # ipfw add 2864 divert 8091 ip from 128.432.53.82 to any * # ipfw add 2864 divert 8091 ip from any to 128.432.53.82 * * the resulting dump file can be read with ... * # tcpdump -nX -r divt.log - */ -/* - * Written by P Kern { pkern [AT] cns.utoronto.ca } - * - * Copyright (c) 2004 University of Toronto. All rights reserved. - * Anyone may use or copy this software except that this copyright - * notice remain intact and that credit is given where it is due. - * The University of Toronto and the author make no warranty and - * accept no liability for this software. * - * From: Header: /local/src/local.lib/SRC/ipfwpcap/RCS/ipfwpcap.c,v 1.4 2004/01/15 16:19:07 pkern Exp - * - * $FreeBSD: /repoman/r/ncvs/src/usr.sbin/ipfwpcap/ipfwpcap.c,v 1.2 2006/09/04 19:30:44 sam Exp $ + * Written by P Kern { pkern [AT] cns.utoronto.ca } + * Adopted by V Pavluk (vladvic_r@mail.ru) + * - changed sighup handler to reopen log file + * - added sigalrm handler to flush data + * - some of exit() codes changed to sysexits(3) + * Major code reworking by Vadim Goncharov + * - signals and daemonizing rewritten + * - style(9) reformat, more sysexits(3) and other cleanups + * - enabled own alarm sending, changed options and updated man page */ +#include +#include +#include /* For MAXPATHLEN */ +#include +#include + +#include +#include /* For IP_MAXPACKET */ +#include /* For IP_MAXPACKET */ + #include #include #include #include #include #include -#include -#include -#include /* for MAXPATHLEN */ -#include -#include - -#include /* for IP_MAXPACKET */ -#include /* for IP_MAXPACKET */ +#include /* XXX normally defined in config.h */ -#define HAVE_STRLCPY 1 -#define HAVE_SNPRINTF 1 -#define HAVE_VSNPRINTF 1 -#include /* see pcap(3) and /usr/src/contrib/libpcap/. */ +#define HAVE_STRLCPY 1 +#define HAVE_SNPRINTF 1 +#define HAVE_VSNPRINTF 1 +#include /* See pcap(3) and /usr/src/contrib/libpcap/. */ #ifdef IP_MAXPACKET -#define BUFMAX IP_MAXPACKET +#define BUFMAX IP_MAXPACKET #else -#define BUFMAX 65535 +#define BUFMAX 65535 #endif #ifndef MAXPATHLEN #define MAXPATHLEN 1024 #endif -static int debug = 0; -static int reflect = 0; /* 1 == write packet back to socket. */ +#define MAXPIDLEN 9 /* Max decimal pid length in characters. */ +#define DEFINTERVAL 60 /* How often to do flush (in seconds). */ -static ssize_t totbytes = 0, maxbytes = 0; -static ssize_t totpkts = 0, maxpkts = 0; +static char *prog = NULL; +static char pidfile[MAXPATHLEN] = { '\0' }; -char *prog = NULL; -char pidfile[MAXPATHLEN] = { '\0' }; +static int quit = 0; /* Is it time to exit? */ +static int do_flush = 0; /* Time to flush log. */ +static int do_reopen = 0; /* Time for log rotating. */ +static int flush_interval = DEFINTERVAL; /* - * tidy up. + * Tidy up macro. */ -void -quit(sig) -int sig; +#define QUIT(code) do { \ + pcap_dump_flush(dp); \ + (void)unlink(pidfile); \ + exit(code); \ +} while(0); + +void +quit_sig(int sig) +{ + quit = 1; +} + +void +flush_log(int sigalrm) +{ + do_flush = 1; + alarm(flush_interval); +} + +void +reopen_log(int sighup) { - (void) unlink(pidfile); - exit(sig); + do_reopen = 1; } /* - * do the "paper work" - * - save my own pid in /var/run/$0.{port#}.pid + * Do the "paper work". + * - fork and detach from terminal, if needed. + * - save my own pid in /var/run/$0.{port#}.pid. */ -okay(pn) -int pn; +void +okay(int pn, int detach, int nochdir) { - FILE *fp; - int fd, numlen, n; - char *p, numbuf[80]; - - numlen = sizeof(numbuf); - bzero(numbuf, numlen); - snprintf(numbuf, numlen-1, "%ld\n", getpid()); - numlen = strlen(numbuf); + int pf; + char *p, strpid[MAXPIDLEN + 1]; + pid_t pid; if (pidfile[0] == '\0') { p = (char *)rindex(prog, '/'); @@ -99,93 +124,158 @@ "%s%s.%d.pid", _PATH_VARRUN, p, pn); } - fd = open(pidfile, O_WRONLY|O_CREAT|O_EXCL, 0644); - if (fd < 0) { perror(pidfile); exit(21); } + pf = open(pidfile, O_WRONLY | O_CREAT | O_EXCL | O_EXLOCK, 0644); - siginterrupt(SIGTERM, 1); - siginterrupt(SIGHUP, 1); - signal (SIGTERM, quit); - signal (SIGHUP, quit); - - n = write(fd, numbuf, numlen); - if (n < 0) { perror(pidfile); quit(23); } - (void) close(fd); + /* + * We couldn't create pid file + */ + if (pf == -1) { + if (errno == EEXIST) { + /* + * If it is because it's already exists + */ + bzero(strpid, MAXPIDLEN + 1); + fprintf(stderr, "PID file already exists!\n"); + + /* + * Try to read the PID stored in the existing file + */ + pf = open(pidfile, O_RDONLY); + if (pf == -1) { + perror("Error opening PID file for reading"); + exit(EX_IOERR); + } + if (read(pf, strpid, MAXPIDLEN) < 0) { + perror("Error reading PID file"); + exit(EX_IOERR); + } + pid = atol(strpid); + close(pf); + + /* + * We found PID, try to determine, whether process + * is running + */ + if (kill(pid, 0) == 0) { + /* + * Signal is delivered, though process with + * such PID exists + */ + fprintf(stderr, "%s already running with PID=%d, exiting...\n", prog, pid); + exit(1); + } else { + /* + * It seems, like the process is killed, so + * we can proceed... + */ + fprintf(stderr, "Stale PID file, overwriting...\n"); + pf = open(pidfile, O_WRONLY | O_TRUNC | O_EXLOCK); + if (pf == -1) { + perror("Error opening PID file for writing"); + exit(EX_IOERR); + } + } + } else { + perror("Error creating PID file"); + exit(EX_IOERR); + } + } + + if (detach) { + if (daemon(nochdir, 0) != 0) { + close(pf); + (void)unlink(pidfile); + perror("daemon"); + exit(EX_OSERR); + } + } + + /* + * Set signal handlers and system behaviour. This must be done + * before saving PID to prevent small, but possible race condition + * when another instance failed to create PID, reads it and tries + * to send signal to us. + */ + siginterrupt(SIGTERM, 1); + siginterrupt(SIGHUP, 1); + + /* Ignore 0th signal, or process may be killed with it by default... */ + signal(0, SIG_IGN); + signal(SIGINT, quit_sig); + signal(SIGTERM, quit_sig); + signal(SIGHUP, reopen_log); + signal(SIGALRM, flush_log); + + /* Save our PID to pidfile. */ + bzero(strpid, MAXPIDLEN + 1); + snprintf(strpid, MAXPIDLEN, "%ld\n", getpid()); + if (write(pf, strpid, strlen(strpid)) < 0) { + perror("Error writing PID file"); + exit(EX_IOERR); + } + close(pf); } +void usage() { - fprintf(stderr, "\ -\n\ -usage:\n\ - %s [-dr] [-b maxbytes] [-p maxpkts] [-P pidfile] portnum dumpfile\n\ -\n\ -where:\n\ - '-d' = enable debugging messages.\n\ - '-r' = reflect. write packets back to the divert socket.\n\ - (ie. simulate the original intent of \"ipfw tee\").\n\ - '-rr' = indicate that it is okay to quit if packet-count or\n\ - byte-count limits are reached (see the NOTE below\n\ - about what this implies).\n\ - '-b bytcnt' = stop dumping after {bytcnt} bytes.\n\ - '-p pktcnt' = stop dumping after {pktcnt} packets.\n\ - '-P pidfile' = alternate file to store the PID\n\ - (default: /var/run/%s.{portnum}.pid).\n\ -\n\ - portnum = divert(4) socket port number.\n\ - dumpfile = file to write captured packets (tcpdump format).\n\ - (specify '-' to write packets to stdout).\n\ -\n\ -", prog, prog); - - fprintf(stderr, "\ -The '-r' option should not be necessary, but because \"ipfw tee\" is broken\n\ -(see BUGS in ipfw(8) for details) this feature can be used along with\n\ -an \"ipfw divert\" rule to simulate the original intent of \"ipfw tee\".\n\ -\n\ -NOTE: With an \"ipfw divert\" rule, diverted packets will silently\n\ - disappear if there is nothing listening to the divert socket.\n\ -\n\ -"); - exit(-1); + fprintf(stderr, + "usage: %s [-dz] [-r | -rr] [-i flush_interval] [-b maxbytes] [-p maxpkts] [-P pidfile] portnum dumpfile\n", + prog); + + exit(EX_USAGE); } -main(ac, av) -int ac; -char *av[]; +main(int argc, char *argv[]) { int r, sd, portnum, l; - struct sockaddr_in sin; - int errflg = 0; + struct sockaddr_in sin; + int errflg = 0, zeroize = 0; int nfd; fd_set rds; ssize_t nr; - char *dumpf, buf[BUFMAX]; + char buf[BUFMAX]; + + int debug = 0; + int reflect = 0; /* 1 == write packet back to socket. */ + + ssize_t totbytes = 0, maxbytes = 0; + ssize_t totpkts = 0, maxpkts = 0; - pcap_t *p; - pcap_dumper_t *dp; struct pcap_pkthdr phd; + pcap_t *p; + pcap_dumper_t *dp; /* Global, as signal handlers may want it. */ + char *dumpf; - prog = av[0]; + prog = argv[0]; - while ((r = getopt(ac, av, "drb:p:P:")) != -1) { + while ((r = getopt(argc, argv, "drzb:i:p:P:")) != -1) { switch (r) { case 'd': - debug++; + debug = 1; break; case 'r': reflect++; break; + case 'i': + flush_interval = atoi(optarg); + if ((flush_interval < 5) || (flush_interval > 3600)) + flush_interval = DEFINTERVAL; + break; case 'b': - maxbytes = (ssize_t) atol(optarg); + maxbytes = (ssize_t)atol(optarg); break; case 'p': - maxpkts = (ssize_t) atoi(optarg); + maxpkts = (ssize_t)atoi(optarg); + break; + case 'z': + zeroize = 1; break; case 'P': - strcpy(pidfile, optarg); + strlcpy(pidfile, optarg, sizeof(pidfile)); break; case '?': default: @@ -194,17 +284,18 @@ } } - if ((ac - optind) != 2 || errflg) + if (((argc - optind) != 2) || errflg) usage(); - portnum = atoi(av[optind++]); - dumpf = av[optind]; + portnum = atoi(argv[optind++]); + dumpf = argv[optind]; -if (debug) fprintf(stderr, "bind to %d.\ndump to '%s'.\n", portnum, dumpf); + if (debug) + fprintf(stderr, "bind to %d.\ndump to '%s'.\n", portnum, dumpf); if ((r = socket(PF_INET, SOCK_RAW, IPPROTO_DIVERT)) == -1) { perror("socket(DIVERT)"); - exit(2); + exit(EX_OSERR); } sd = r; @@ -214,92 +305,136 @@ if (bind(sd, (struct sockaddr *)&sin, sizeof(sin)) == -1) { perror("bind(divert)"); - exit(3); + exit(EX_OSERR); } p = pcap_open_dead(DLT_RAW, BUFMAX); dp = pcap_dump_open(p, dumpf); if (dp == NULL) { pcap_perror(p, dumpf); - exit(4); + exit(EX_OSFILE); } + + /* + * We will not chdir() to root directory if user specified + * non-absolute pathname to logfile, because in this case + * logfile will be created in another directory after first + * reopening on SIGHUP. + */ + okay(portnum, !debug, dumpf[0] == '/' ? 0 : 1); - okay(portnum); + alarm(flush_interval); /* Start timer. */ nfd = sd + 1; - for (;;) { + while (!quit) { + /* + * Handle signal actions on next iteration after select()'s EINTR. + */ + if (do_flush) { + if (debug) + fprintf(stderr, "Flushing log.\n"); + pcap_dump_flush(dp); + do_flush = 0; + } + + if (do_reopen) { + if (debug) + fprintf(stderr, "Reopening log.\n"); + pcap_dump_close(dp); + dp = pcap_dump_open(p, dumpf); + if (zeroize) { + totbytes = 0; + totpkts = 0; + } + do_reopen = 0; + } + + /* Prepare for select(). */ FD_ZERO(&rds); FD_SET(sd, &rds); r = select(nfd, &rds, NULL, NULL, NULL); if (r == -1) { - if (errno == EINTR) continue; + if (errno == EINTR) + continue; perror("select"); - quit(11); + QUIT(EX_OSERR); } - if (!FD_ISSET(sd, &rds)) - /* hmm. no work. */ - continue; + continue; /* Hmm. No work. */ /* - * use recvfrom(3 and sendto(3) as in natd(8). - * see /usr/src/sbin/natd/natd.c - * see ipfw(8) about using 'divert' and 'tee'. + * Use recvfrom(3 and sendto(3) as in natd(8). + * See /usr/src/sbin/natd/natd.c. + * See ipfw(8) about using 'divert' and 'tee'. */ /* - * read packet. + * Read packet. */ l = sizeof(sin); nr = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&sin, &l); -if (debug) fprintf(stderr, "recvfrom(%d) = %d (%d)\n", sd, nr, l); - if (nr < 0 && errno != EINTR) { + + if (debug) + fprintf(stderr, "recvfrom(%d) = %d (%d)\n", sd, nr, l); + + if ((nr < 0) && (errno != EINTR)) { perror("recvfrom(sd)"); - quit(12); + QUIT(EX_IOERR); } - if (nr <= 0) continue; + if (nr <= 0) + continue; - if (reflect) { + if (reflect > 0) { /* - * write packet back so it can continue - * being processed by any further IPFW rules. + * Write packet back so it can continue being + * processed by any further IPFW rules. */ l = sizeof(sin); r = sendto(sd, buf, nr, 0, (struct sockaddr *)&sin, l); -if (debug) fprintf(stderr, " sendto(%d) = %d\n", sd, r); - if (r < 0) { perror("sendto(sd)"); quit(13); } + if (debug) + fprintf(stderr, "sendto(%d) = %d\n", sd, r); + if (r < 0) { + perror("sendto(sd)"); + QUIT(EX_IOERR); + } } /* - * check maximums, if any. - * but don't quit if must continue reflecting packets. + * Check maximums, if any. But don't quit if must continue + * reflecting packets. However, it's ok to exit when + * reflect > 1. */ if (maxpkts) { totpkts++; if (totpkts > maxpkts) { - if (reflect == 1) continue; - quit(0); + if (reflect == 1) + continue; + QUIT(EX_OK); } } if (maxbytes) { totbytes += nr; if (totbytes > maxbytes) { - if (reflect == 1) continue; - quit(0); + if (reflect == 1) + continue; + QUIT(EX_OK); } } /* - * save packet in tcpdump(1) format. see pcap(3). - * divert packets are fully assembled. see ipfw(8). + * Save packet in tcpdump(1) format. See pcap(3). Divert + * packets are fully assembled, see ipfw(8). */ - (void) gettimeofday(&(phd.ts), NULL); + (void)gettimeofday(&(phd.ts), NULL); phd.caplen = phd.len = nr; pcap_dump((u_char *)dp, &phd, buf); - if (ferror((FILE *)dp)) { perror(dumpf); quit(14); } - (void) fflush((FILE *)dp); + if (ferror((FILE *)dp)) { + perror(dumpf); + QUIT(EX_IOERR); + } + } - quit(0); + QUIT(EX_OK); } --- ipfwpcap.8.orig Sat Nov 11 05:08:21 2006 +++ ipfwpcap.8 Thu Dec 21 09:04:28 2006 @@ -24,7 +24,7 @@ .\" .\" $FreeBSD: /repoman/r/ncvs/src/usr.sbin/ipfwpcap/ipfwpcap.8,v 1.3 2006/09/30 19:07:03 ru Exp $ .\" -.Dd May 22, 2006 +.Dd Dec 20, 2006 .Dt IPFWPCAP 8 .Os .Sh NAME @@ -32,7 +32,9 @@ .Nd "copy diverted packets to a file in tcpdump format" .Sh SYNOPSIS .Nm -.Op Fl dr +.Op Fl dz +.Op Fl r | rr +.Op Fl i Ar flush_interval .Op Fl b Ar maxbytes .Op Fl p Ar maxpkts .Op Fl P Ar pidfile @@ -48,19 +50,44 @@ .Xr ipfw 8 to a port on which .Nm -listens. +daemon listens. The packets are then dropped unless .Fl r is used. .Pp +.Nm +closes and then re-opens the dump file when it receives +.Dv SIGHUP , +permitting +.Xr newsyslog 8 +to rotate dump logfiles automatically. +Note that already existing file will be truncated on open or re-open. +Receiving +.Dv SIGALRM +causes +.Nm +to flush the current logfile buffers to the disk, thus making the most +recent logs available. +The buffers are also flushed every +.Ar flush_interval +seconds. +.Pp The options are as follows: .Bl -tag -width indent .It Fl d -Turns on extra debugging messages. +Turns on debugging messages and prevents +.Nm +from making itself a background daemon. .It Fl r Writes packets back to the .Xr divert 4 socket. +This option can be used to reflect packets back to +.Xr ipfw 8 +if you for some reasons want to use +.Dq divert +rule action instead of usually more suitable +.Dq tee . .It Fl rr Indicates that it is okay to quit if .Ar maxbytes @@ -74,6 +101,17 @@ Stop dumping after .Ar maxbytes bytes. +Note that size of resulting +.Ar dumpfile +will be greater than +.Ar maxbytes +because +.Xr pcap 3 +stores additional headers for each packet in the file. +.It Fl i Ar flush_interval +Time in seconds to delay between automatic flushes of the file. +This may be specified with a value between 5 and 3600 seconds. +If not specified, the default is 60 seconds. .It Fl p Ar maxpkts Stop dumping after .Ar maxpkt @@ -81,7 +119,10 @@ .It Fl P Ar pidfile File to store PID number in. Default is -.Pa /var/run/ipwfpcap.portnr.pid . +.Pa /var/run/ipwfpcap. Ns Ao Ar portnum Ac Ns Pa .pid . +.It Fl z +Reset byte and packet counters to zero after each reopening of the +.Ar dumpfile . .El .Pp The @@ -98,7 +139,7 @@ .Sh EXIT STATUS .Ex -std .Sh EXAMPLES -.Dl "ipfwpcap -r 8091 divt.log &" +.Dl "ipfwpcap -r 8091 divt.log" .Pp Starts .Nm @@ -117,12 +158,13 @@ .Xr tcpdump 1 , .Xr pcap 3 , .Xr divert 4 , -.Xr ipfw 8 +.Xr ipfw 8 , +.Xr pflogd 8 .Sh HISTORY The .Nm utility first appeared in -.Fx 7.0 . +.Fx 6.3 . .Sh AUTHORS .An -nosplit .Nm ------------2UNlkgEOncIwckbiXYzy0f-- From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 10:54:24 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9220B16A46B for ; Tue, 18 Dec 2007 10:54:24 +0000 (UTC) (envelope-from krishna.ramdass@gmail.com) Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.187]) by mx1.freebsd.org (Postfix) with ESMTP id 0C4BE13C457 for ; Tue, 18 Dec 2007 10:54:23 +0000 (UTC) (envelope-from krishna.ramdass@gmail.com) Received: by fk-out-0910.google.com with SMTP id b27so2722036fka.11 for ; Tue, 18 Dec 2007 02:54:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=PbEWHwGG8YCpt6O7Ik6kwd10Vwcd8qlQpYpUoHY1ir0=; b=BzTAYS1AcScPaVATlnSPmLkhKCUQxZxH3xmA92PWYy1zFXf4nXTydOb2Y96Ymen3vb8Rz05tR2eApa/qKJ5UmYjWvvEOKY0WDIFJRfb5Z+XDsiYB/ptffo9mCf51yGbHaUxnDeSsDgB49xCCX3en0Zc1pf8QsrBIcMoHD9qxApI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=mOLl/wqL4cBCTo0zKoge/7VPaoRRAasafo3OXUS4wss8lvtrkT2fOkxN+3dTs7VgBmfIPU6KRTkFLhT7JsyaJQtRmoHueLTOr/G7AzV5ftQju9OVZjXTd05tflhEc0NdAs8jfa0xooX0bXdWHDTog/Kd7OkUAbgajEd67CYpcLw= Received: by 10.82.134.12 with SMTP id h12mr7364897bud.29.1197975262751; Tue, 18 Dec 2007 02:54:22 -0800 (PST) Received: by 10.82.145.3 with HTTP; Tue, 18 Dec 2007 02:54:22 -0800 (PST) Message-ID: <8c1eada80712180254t578da8cj418838c3e941f26d@mail.gmail.com> Date: Tue, 18 Dec 2007 16:24:22 +0530 From: "Krishna Kumar" To: "Andrey V. Elsukov" In-Reply-To: <476673A6.8050306@yandex.ru> MIME-Version: 1.0 References: <8c1eada80712170043w216b36b7gb5de6a149b952604@mail.gmail.com> <476673A6.8050306@yandex.ru> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, stsp@stsp.name, freebsd-drivers@freebsd.org Subject: Re: WOL suport in Broadcom 5721 (57XX) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 10:54:24 -0000 Hi, I found the pages, but the WOL patch is not present for Broadcom chipsets (BCM57XX). I am struck at point where I am confused as to why it does not work if it works with my operating system (linux based micro kernel). Somebody working on this? In case yes, I can surely be of help. Thanks, KK On Dec 17, 2007 6:33 PM, Andrey V. Elsukov wrote: > Krishna Kumar wrote: > > Is this in the list of todo's?? > > Can this feature not be supported due to design issues? > > Is somebody trying this out somewhere? > > Please do copy me on the reply as I am not subscribed to the list. > > Look into freebsd-hackers@ mail archive. In the previous month there > was a discussion about WOL support. Look to topics: > 1. FreeBSD WOL sis on > 2. How to add wake on lan support for your card > > And as i remember, Sam Leffer has made some work for WOL support. > > -- > WBR, Andrey V. Elsukov > -- Thanks and Best Regards, KK From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 11:18:49 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1372E16A468; Tue, 18 Dec 2007 11:18:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 997B213C4E7; Tue, 18 Dec 2007 11:18:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBIBIZCE016955 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 22:18:38 +1100 Date: Tue, 18 Dec 2007 22:18:35 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Fullmer In-Reply-To: <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> Message-ID: <20071218220924.P6176@besplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 11:18:49 -0000 On Mon, 17 Dec 2007, Mark Fullmer wrote: > Thanks. Have a kernel building now. It takes about a day of uptime after > reboot before I'll see the problem. Yes run "find / >/dev/null" to see the problem if it is the syncer one. At least the syscall latency problem does seem to be this. Under ~5.2, with the above find and also "while :; do sync; done" (to give latency spike more often), your program (with some fflush(stdout)'s and args 1 7700) gives: % 1197976029041677 12696 0 % 1197976033196396 9761 4154719 % 1197976034060031 13360 863635 % 1197976039080632 13749 5020601 % 1197976043195594 8536 4114962 % 1197976044100601 13505 905007 % 1197976049121870 14562 5021269 % 1197976052195631 8192 3073761 % 1197976054141545 14024 1945914 % 1197976059162357 14623 5020812 % 1197976063195735 7830 4033378 % 1197976064182564 14618 986829 % 1197976069202982 14823 5020418 % 1197976074223722 15350 5020740 % 1197976079244311 15726 5020589 % 1197976084264690 15893 5020379 % 1197976089289409 15058 5024719 % 1197976094315433 16209 5026024 % 1197976095197277 8015 881844 % 1197976099335529 16092 4138252 % 1197976104356513 16863 5020984 % 1197976109376236 16373 5019723 % 1197976114396803 16727 5020567 % 1197976119416822 16533 5020019 % 1197976124437790 17288 5020968 % 1197976126200637 10060 1762847 % 1197976127198459 7839 997822 % 1197976129457321 16606 2258862 % 1197976134477582 16654 5020261 This clearly shows the spike every 5 seconds, and the latency creeping up as vfs.numvnodes increases. It started at about 20000 and ended at about 64000. The syncer won't be fixed soon, so the fix for dropped packets requires figuring out why the syncer affects networking. Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 11:59:12 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DCB316A580; Tue, 18 Dec 2007 11:59:12 +0000 (UTC) (envelope-from sem@FreeBSD.org) Received: from mail.ciam.ru (ns.ciam.ru [213.247.195.75]) by mx1.freebsd.org (Postfix) with ESMTP id 22B2413C442; Tue, 18 Dec 2007 11:59:12 +0000 (UTC) (envelope-from sem@FreeBSD.org) Received: from dhcp250-210.yandex.ru ([87.250.250.210]) by mail.ciam.ru with esmtpa (Exim 4.x) id 1J4aY9-0006g8-Cu; Tue, 18 Dec 2007 14:23:49 +0300 Message-ID: <4767AD27.8070901@FreeBSD.org> Date: Tue, 18 Dec 2007 14:21:11 +0300 From: Sergey Matveychuk User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Vadim Goncharov References: In-Reply-To: Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: maxim@freebsd.org, mlaier@freebsd.org, "freebsd-stable@freebsd.org" , "freebsd-net@freebsd.org" , "freebsd-ipfw@freebsd.org" , "freebsd-current@freebsd.org" , phk@freebsd.org Subject: Re: [PATCH] ipfwpcap(8) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 11:59:12 -0000 Vadim Goncharov wrote: > Hi, > > I've recently found a patch (also available at > http://antigreen.org/vadim/freebsd/ipfwpcap/) made by me and my friend > in January to ipfwpcap(8) introduced in 7.0. Now it have more features, Unfortunately too old to apply. And using of pidfile_* functions from libutil is preferable IMHO. -- Dixi. Sem. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 12:17:09 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4ED116A417; Tue, 18 Dec 2007 12:17:09 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from relay1.tpu.ru (relay1.tpu.ru [213.183.112.102]) by mx1.freebsd.org (Postfix) with ESMTP id 49CBA13C467; Tue, 18 Dec 2007 12:17:08 +0000 (UTC) (envelope-from vadimnuclight@tpu.ru) Received: from localhost (localhost.localdomain [127.0.0.1]) by relay1.tpu.ru (Postfix) with ESMTP id 342DD1045DC; Tue, 18 Dec 2007 17:52:19 +0600 (NOVT) X-Virus-Scanned: amavisd-new at tpu.ru Received: from relay1.tpu.ru ([127.0.0.1]) by localhost (relay1.tpu.ru [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ECq9l8TNcU15; Tue, 18 Dec 2007 17:52:15 +0600 (NOVT) Received: from mail.main.tpu.ru (mail.main.tpu.ru [10.0.0.3]) by relay1.tpu.ru (Postfix) with ESMTP id A2BE2104605; Tue, 18 Dec 2007 17:52:15 +0600 (NOVT) Received: from mail.tpu.ru ([213.183.112.105]) by mail.main.tpu.ru with Microsoft SMTPSVC(6.0.3790.3959); Tue, 18 Dec 2007 17:52:15 +0600 Received: from nuclight.avtf.net ([78.140.2.237]) by mail.tpu.ru over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Tue, 18 Dec 2007 17:52:15 +0600 Date: Tue, 18 Dec 2007 17:52:11 +0600 To: "Sergey Matveychuk" References: <4767AD27.8070901@FreeBSD.org> From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: multipart/mixed; boundary=----------GJeOyVu3xITaB9Flyz9pX0 MIME-Version: 1.0 Message-ID: In-Reply-To: <4767AD27.8070901@FreeBSD.org> User-Agent: Opera M2/7.54 (Win32, build 3865) X-OriginalArrivalTime: 18 Dec 2007 11:52:15.0158 (UTC) FILETIME=[6ED1DD60:01C8416C] Cc: maxim@freebsd.org, mlaier@freebsd.org, "freebsd-stable@freebsd.org" , "freebsd-net@freebsd.org" , "freebsd-ipfw@freebsd.org" , "freebsd-current@freebsd.org" , phk@freebsd.org Subject: Re: [PATCH] ipfwpcap(8) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 12:17:09 -0000 ------------GJeOyVu3xITaB9Flyz9pX0 Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r Content-Transfer-Encoding: 8bit 18.12.07 @ 17:21 Sergey Matveychuk wrote: >> I've recently found a patch (also available at >> http://antigreen.org/vadim/freebsd/ipfwpcap/) made by me and my friend >> in January to ipfwpcap(8) introduced in 7.0. Now it have more features, > > Unfortunately too old to apply. Mislooked that, sorry. But revision 1.3 differs only with line signal (SIGINT, ...); - which my patch also includes. I've attached patch against current revision 1.3 (use it instead of original letter's one). > And using of pidfile_* functions from libutil is preferable IMHO. Surely, but I think that should be another commit, as not a user-visible change. -- WBR, Vadim Goncharov ------------GJeOyVu3xITaB9Flyz9pX0 Content-Disposition: attachment; filename=ipfwpcap.patch Content-Type: application/octet-stream; name=ipfwpcap.patch Content-Transfer-Encoding: 8bit --- ipfwpcap.c.1.3 Tue Dec 18 17:35:14 2007 +++ ipfwpcap.c Tue Dec 19 08:30:00 2006 @@ -1,95 +1,120 @@ +/*- + * Copyright (c) 2004 University of Toronto. All rights reserved. + * Anyone may use or copy this software except that this copyright + * notice remain intact and that credit is given where it is due. + * The University of Toronto and the author make no warranty and + * accept no liability for this software. + * + * $FreeBSD: /repoman/r/ncvs/src/usr.sbin/ipfwpcap/ipfwpcap.c,v 1.2 2006/09/04 19:30:44 sam Exp $ + */ + /* - * copy diverted (or tee'd) packets to a file in 'tcpdump' format + * Copy diverted (or tee'd) packets to a file in 'tcpdump' format * (ie. this uses the '-lpcap' routines). * - * example usage: - * # ipfwpcap -r 8091 divt.log & + * Example usage: + * # ipfwpcap -r 8091 divt.log * # ipfw add 2864 divert 8091 ip from 128.432.53.82 to any * # ipfw add 2864 divert 8091 ip from any to 128.432.53.82 * * the resulting dump file can be read with ... * # tcpdump -nX -r divt.log - */ -/* - * Written by P Kern { pkern [AT] cns.utoronto.ca } * - * Copyright (c) 2004 University of Toronto. All rights reserved. - * Anyone may use or copy this software except that this copyright - * notice remain intact and that credit is given where it is due. - * The University of Toronto and the author make no warranty and - * accept no liability for this software. - * - * From: Header: /local/src/local.lib/SRC/ipfwpcap/RCS/ipfwpcap.c,v 1.4 2004/01/15 16:19:07 pkern Exp - * - * $FreeBSD: src/usr.sbin/ipfwpcap/ipfwpcap.c,v 1.3 2007/10/12 14:57:39 csjp Exp $ + * Written by P Kern { pkern [AT] cns.utoronto.ca } + * Adopted by V Pavluk (vladvic_r@mail.ru) + * - changed sighup handler to reopen log file + * - added sigalrm handler to flush data + * - some of exit() codes changed to sysexits(3) + * Major code reworking by Vadim Goncharov + * - signals and daemonizing rewritten + * - style(9) reformat, more sysexits(3) and other cleanups + * - enabled own alarm sending, changed options and updated man page */ +#include +#include +#include /* For MAXPATHLEN */ +#include +#include + +#include +#include /* For IP_MAXPACKET */ +#include /* For IP_MAXPACKET */ + #include #include #include #include #include #include -#include -#include -#include /* for MAXPATHLEN */ -#include -#include - -#include /* for IP_MAXPACKET */ -#include /* for IP_MAXPACKET */ +#include /* XXX normally defined in config.h */ -#define HAVE_STRLCPY 1 -#define HAVE_SNPRINTF 1 -#define HAVE_VSNPRINTF 1 -#include /* see pcap(3) and /usr/src/contrib/libpcap/. */ +#define HAVE_STRLCPY 1 +#define HAVE_SNPRINTF 1 +#define HAVE_VSNPRINTF 1 +#include /* See pcap(3) and /usr/src/contrib/libpcap/. */ #ifdef IP_MAXPACKET -#define BUFMAX IP_MAXPACKET +#define BUFMAX IP_MAXPACKET #else -#define BUFMAX 65535 +#define BUFMAX 65535 #endif #ifndef MAXPATHLEN #define MAXPATHLEN 1024 #endif -static int debug = 0; -static int reflect = 0; /* 1 == write packet back to socket. */ +#define MAXPIDLEN 9 /* Max decimal pid length in characters. */ +#define DEFINTERVAL 60 /* How often to do flush (in seconds). */ -static ssize_t totbytes = 0, maxbytes = 0; -static ssize_t totpkts = 0, maxpkts = 0; +static char *prog = NULL; +static char pidfile[MAXPATHLEN] = { '\0' }; -char *prog = NULL; -char pidfile[MAXPATHLEN] = { '\0' }; +static int quit = 0; /* Is it time to exit? */ +static int do_flush = 0; /* Time to flush log. */ +static int do_reopen = 0; /* Time for log rotating. */ +static int flush_interval = DEFINTERVAL; /* - * tidy up. + * Tidy up macro. */ -void -quit(sig) -int sig; +#define QUIT(code) do { \ + pcap_dump_flush(dp); \ + (void)unlink(pidfile); \ + exit(code); \ +} while(0); + +void +quit_sig(int sig) +{ + quit = 1; +} + +void +flush_log(int sigalrm) { - (void) unlink(pidfile); - exit(sig); + do_flush = 1; + alarm(flush_interval); +} + +void +reopen_log(int sighup) +{ + do_reopen = 1; } /* - * do the "paper work" - * - save my own pid in /var/run/$0.{port#}.pid + * Do the "paper work". + * - fork and detach from terminal, if needed. + * - save my own pid in /var/run/$0.{port#}.pid. */ -okay(pn) -int pn; +void +okay(int pn, int detach, int nochdir) { - FILE *fp; - int fd, numlen, n; - char *p, numbuf[80]; - - numlen = sizeof(numbuf); - bzero(numbuf, numlen); - snprintf(numbuf, numlen-1, "%ld\n", getpid()); - numlen = strlen(numbuf); + int pf; + char *p, strpid[MAXPIDLEN + 1]; + pid_t pid; if (pidfile[0] == '\0') { p = (char *)rindex(prog, '/'); @@ -99,94 +124,158 @@ "%s%s.%d.pid", _PATH_VARRUN, p, pn); } - fd = open(pidfile, O_WRONLY|O_CREAT|O_EXCL, 0644); - if (fd < 0) { perror(pidfile); exit(21); } + pf = open(pidfile, O_WRONLY | O_CREAT | O_EXCL | O_EXLOCK, 0644); - siginterrupt(SIGTERM, 1); - siginterrupt(SIGHUP, 1); - signal (SIGTERM, quit); - signal (SIGHUP, quit); - signal (SIGINT, quit); - - n = write(fd, numbuf, numlen); - if (n < 0) { perror(pidfile); quit(23); } - (void) close(fd); + /* + * We couldn't create pid file + */ + if (pf == -1) { + if (errno == EEXIST) { + /* + * If it is because it's already exists + */ + bzero(strpid, MAXPIDLEN + 1); + fprintf(stderr, "PID file already exists!\n"); + + /* + * Try to read the PID stored in the existing file + */ + pf = open(pidfile, O_RDONLY); + if (pf == -1) { + perror("Error opening PID file for reading"); + exit(EX_IOERR); + } + if (read(pf, strpid, MAXPIDLEN) < 0) { + perror("Error reading PID file"); + exit(EX_IOERR); + } + pid = atol(strpid); + close(pf); + + /* + * We found PID, try to determine, whether process + * is running + */ + if (kill(pid, 0) == 0) { + /* + * Signal is delivered, though process with + * such PID exists + */ + fprintf(stderr, "%s already running with PID=%d, exiting...\n", prog, pid); + exit(1); + } else { + /* + * It seems, like the process is killed, so + * we can proceed... + */ + fprintf(stderr, "Stale PID file, overwriting...\n"); + pf = open(pidfile, O_WRONLY | O_TRUNC | O_EXLOCK); + if (pf == -1) { + perror("Error opening PID file for writing"); + exit(EX_IOERR); + } + } + } else { + perror("Error creating PID file"); + exit(EX_IOERR); + } + } + + if (detach) { + if (daemon(nochdir, 0) != 0) { + close(pf); + (void)unlink(pidfile); + perror("daemon"); + exit(EX_OSERR); + } + } + + /* + * Set signal handlers and system behaviour. This must be done + * before saving PID to prevent small, but possible race condition + * when another instance failed to create PID, reads it and tries + * to send signal to us. + */ + siginterrupt(SIGTERM, 1); + siginterrupt(SIGHUP, 1); + + /* Ignore 0th signal, or process may be killed with it by default... */ + signal(0, SIG_IGN); + signal(SIGINT, quit_sig); + signal(SIGTERM, quit_sig); + signal(SIGHUP, reopen_log); + signal(SIGALRM, flush_log); + + /* Save our PID to pidfile. */ + bzero(strpid, MAXPIDLEN + 1); + snprintf(strpid, MAXPIDLEN, "%ld\n", getpid()); + if (write(pf, strpid, strlen(strpid)) < 0) { + perror("Error writing PID file"); + exit(EX_IOERR); + } + close(pf); } +void usage() { - fprintf(stderr, "\ -\n\ -usage:\n\ - %s [-dr] [-b maxbytes] [-p maxpkts] [-P pidfile] portnum dumpfile\n\ -\n\ -where:\n\ - '-d' = enable debugging messages.\n\ - '-r' = reflect. write packets back to the divert socket.\n\ - (ie. simulate the original intent of \"ipfw tee\").\n\ - '-rr' = indicate that it is okay to quit if packet-count or\n\ - byte-count limits are reached (see the NOTE below\n\ - about what this implies).\n\ - '-b bytcnt' = stop dumping after {bytcnt} bytes.\n\ - '-p pktcnt' = stop dumping after {pktcnt} packets.\n\ - '-P pidfile' = alternate file to store the PID\n\ - (default: /var/run/%s.{portnum}.pid).\n\ -\n\ - portnum = divert(4) socket port number.\n\ - dumpfile = file to write captured packets (tcpdump format).\n\ - (specify '-' to write packets to stdout).\n\ -\n\ -", prog, prog); - - fprintf(stderr, "\ -The '-r' option should not be necessary, but because \"ipfw tee\" is broken\n\ -(see BUGS in ipfw(8) for details) this feature can be used along with\n\ -an \"ipfw divert\" rule to simulate the original intent of \"ipfw tee\".\n\ -\n\ -NOTE: With an \"ipfw divert\" rule, diverted packets will silently\n\ - disappear if there is nothing listening to the divert socket.\n\ -\n\ -"); - exit(-1); + fprintf(stderr, + "usage: %s [-dz] [-r | -rr] [-i flush_interval] [-b maxbytes] [-p maxpkts] [-P pidfile] portnum dumpfile\n", + prog); + + exit(EX_USAGE); } -main(ac, av) -int ac; -char *av[]; +main(int argc, char *argv[]) { int r, sd, portnum, l; - struct sockaddr_in sin; - int errflg = 0; + struct sockaddr_in sin; + int errflg = 0, zeroize = 0; int nfd; fd_set rds; ssize_t nr; - char *dumpf, buf[BUFMAX]; + char buf[BUFMAX]; + + int debug = 0; + int reflect = 0; /* 1 == write packet back to socket. */ + + ssize_t totbytes = 0, maxbytes = 0; + ssize_t totpkts = 0, maxpkts = 0; - pcap_t *p; - pcap_dumper_t *dp; struct pcap_pkthdr phd; + pcap_t *p; + pcap_dumper_t *dp; /* Global, as signal handlers may want it. */ + char *dumpf; - prog = av[0]; + prog = argv[0]; - while ((r = getopt(ac, av, "drb:p:P:")) != -1) { + while ((r = getopt(argc, argv, "drzb:i:p:P:")) != -1) { switch (r) { case 'd': - debug++; + debug = 1; break; case 'r': reflect++; break; + case 'i': + flush_interval = atoi(optarg); + if ((flush_interval < 5) || (flush_interval > 3600)) + flush_interval = DEFINTERVAL; + break; case 'b': - maxbytes = (ssize_t) atol(optarg); + maxbytes = (ssize_t)atol(optarg); break; case 'p': - maxpkts = (ssize_t) atoi(optarg); + maxpkts = (ssize_t)atoi(optarg); + break; + case 'z': + zeroize = 1; break; case 'P': - strcpy(pidfile, optarg); + strlcpy(pidfile, optarg, sizeof(pidfile)); break; case '?': default: @@ -195,17 +284,18 @@ } } - if ((ac - optind) != 2 || errflg) + if (((argc - optind) != 2) || errflg) usage(); - portnum = atoi(av[optind++]); - dumpf = av[optind]; + portnum = atoi(argv[optind++]); + dumpf = argv[optind]; -if (debug) fprintf(stderr, "bind to %d.\ndump to '%s'.\n", portnum, dumpf); + if (debug) + fprintf(stderr, "bind to %d.\ndump to '%s'.\n", portnum, dumpf); if ((r = socket(PF_INET, SOCK_RAW, IPPROTO_DIVERT)) == -1) { perror("socket(DIVERT)"); - exit(2); + exit(EX_OSERR); } sd = r; @@ -215,92 +305,136 @@ if (bind(sd, (struct sockaddr *)&sin, sizeof(sin)) == -1) { perror("bind(divert)"); - exit(3); + exit(EX_OSERR); } p = pcap_open_dead(DLT_RAW, BUFMAX); dp = pcap_dump_open(p, dumpf); if (dp == NULL) { pcap_perror(p, dumpf); - exit(4); + exit(EX_OSFILE); } + + /* + * We will not chdir() to root directory if user specified + * non-absolute pathname to logfile, because in this case + * logfile will be created in another directory after first + * reopening on SIGHUP. + */ + okay(portnum, !debug, dumpf[0] == '/' ? 0 : 1); - okay(portnum); + alarm(flush_interval); /* Start timer. */ nfd = sd + 1; - for (;;) { + while (!quit) { + /* + * Handle signal actions on next iteration after select()'s EINTR. + */ + if (do_flush) { + if (debug) + fprintf(stderr, "Flushing log.\n"); + pcap_dump_flush(dp); + do_flush = 0; + } + + if (do_reopen) { + if (debug) + fprintf(stderr, "Reopening log.\n"); + pcap_dump_close(dp); + dp = pcap_dump_open(p, dumpf); + if (zeroize) { + totbytes = 0; + totpkts = 0; + } + do_reopen = 0; + } + + /* Prepare for select(). */ FD_ZERO(&rds); FD_SET(sd, &rds); r = select(nfd, &rds, NULL, NULL, NULL); if (r == -1) { - if (errno == EINTR) continue; + if (errno == EINTR) + continue; perror("select"); - quit(11); + QUIT(EX_OSERR); } - if (!FD_ISSET(sd, &rds)) - /* hmm. no work. */ - continue; + continue; /* Hmm. No work. */ /* - * use recvfrom(3 and sendto(3) as in natd(8). - * see /usr/src/sbin/natd/natd.c - * see ipfw(8) about using 'divert' and 'tee'. + * Use recvfrom(3 and sendto(3) as in natd(8). + * See /usr/src/sbin/natd/natd.c. + * See ipfw(8) about using 'divert' and 'tee'. */ /* - * read packet. + * Read packet. */ l = sizeof(sin); nr = recvfrom(sd, buf, sizeof(buf), 0, (struct sockaddr *)&sin, &l); -if (debug) fprintf(stderr, "recvfrom(%d) = %d (%d)\n", sd, nr, l); - if (nr < 0 && errno != EINTR) { + + if (debug) + fprintf(stderr, "recvfrom(%d) = %d (%d)\n", sd, nr, l); + + if ((nr < 0) && (errno != EINTR)) { perror("recvfrom(sd)"); - quit(12); + QUIT(EX_IOERR); } - if (nr <= 0) continue; + if (nr <= 0) + continue; - if (reflect) { + if (reflect > 0) { /* - * write packet back so it can continue - * being processed by any further IPFW rules. + * Write packet back so it can continue being + * processed by any further IPFW rules. */ l = sizeof(sin); r = sendto(sd, buf, nr, 0, (struct sockaddr *)&sin, l); -if (debug) fprintf(stderr, " sendto(%d) = %d\n", sd, r); - if (r < 0) { perror("sendto(sd)"); quit(13); } + if (debug) + fprintf(stderr, "sendto(%d) = %d\n", sd, r); + if (r < 0) { + perror("sendto(sd)"); + QUIT(EX_IOERR); + } } /* - * check maximums, if any. - * but don't quit if must continue reflecting packets. + * Check maximums, if any. But don't quit if must continue + * reflecting packets. However, it's ok to exit when + * reflect > 1. */ if (maxpkts) { totpkts++; if (totpkts > maxpkts) { - if (reflect == 1) continue; - quit(0); + if (reflect == 1) + continue; + QUIT(EX_OK); } } if (maxbytes) { totbytes += nr; if (totbytes > maxbytes) { - if (reflect == 1) continue; - quit(0); + if (reflect == 1) + continue; + QUIT(EX_OK); } } /* - * save packet in tcpdump(1) format. see pcap(3). - * divert packets are fully assembled. see ipfw(8). + * Save packet in tcpdump(1) format. See pcap(3). Divert + * packets are fully assembled, see ipfw(8). */ - (void) gettimeofday(&(phd.ts), NULL); + (void)gettimeofday(&(phd.ts), NULL); phd.caplen = phd.len = nr; pcap_dump((u_char *)dp, &phd, buf); - if (ferror((FILE *)dp)) { perror(dumpf); quit(14); } - (void) fflush((FILE *)dp); + if (ferror((FILE *)dp)) { + perror(dumpf); + QUIT(EX_IOERR); + } + } - quit(0); + QUIT(EX_OK); } --- ipfwpcap.8.orig Sat Nov 11 05:08:21 2006 +++ ipfwpcap.8 Thu Dec 21 09:04:28 2006 @@ -24,7 +24,7 @@ .\" .\" $FreeBSD: /repoman/r/ncvs/src/usr.sbin/ipfwpcap/ipfwpcap.8,v 1.3 2006/09/30 19:07:03 ru Exp $ .\" -.Dd May 22, 2006 +.Dd Dec 20, 2006 .Dt IPFWPCAP 8 .Os .Sh NAME @@ -32,7 +32,9 @@ .Nd "copy diverted packets to a file in tcpdump format" .Sh SYNOPSIS .Nm -.Op Fl dr +.Op Fl dz +.Op Fl r | rr +.Op Fl i Ar flush_interval .Op Fl b Ar maxbytes .Op Fl p Ar maxpkts .Op Fl P Ar pidfile @@ -48,19 +50,44 @@ .Xr ipfw 8 to a port on which .Nm -listens. +daemon listens. The packets are then dropped unless .Fl r is used. .Pp +.Nm +closes and then re-opens the dump file when it receives +.Dv SIGHUP , +permitting +.Xr newsyslog 8 +to rotate dump logfiles automatically. +Note that already existing file will be truncated on open or re-open. +Receiving +.Dv SIGALRM +causes +.Nm +to flush the current logfile buffers to the disk, thus making the most +recent logs available. +The buffers are also flushed every +.Ar flush_interval +seconds. +.Pp The options are as follows: .Bl -tag -width indent .It Fl d -Turns on extra debugging messages. +Turns on debugging messages and prevents +.Nm +from making itself a background daemon. .It Fl r Writes packets back to the .Xr divert 4 socket. +This option can be used to reflect packets back to +.Xr ipfw 8 +if you for some reasons want to use +.Dq divert +rule action instead of usually more suitable +.Dq tee . .It Fl rr Indicates that it is okay to quit if .Ar maxbytes @@ -74,6 +101,17 @@ Stop dumping after .Ar maxbytes bytes. +Note that size of resulting +.Ar dumpfile +will be greater than +.Ar maxbytes +because +.Xr pcap 3 +stores additional headers for each packet in the file. +.It Fl i Ar flush_interval +Time in seconds to delay between automatic flushes of the file. +This may be specified with a value between 5 and 3600 seconds. +If not specified, the default is 60 seconds. .It Fl p Ar maxpkts Stop dumping after .Ar maxpkt @@ -81,7 +119,10 @@ .It Fl P Ar pidfile File to store PID number in. Default is -.Pa /var/run/ipwfpcap.portnr.pid . +.Pa /var/run/ipwfpcap. Ns Ao Ar portnum Ac Ns Pa .pid . +.It Fl z +Reset byte and packet counters to zero after each reopening of the +.Ar dumpfile . .El .Pp The @@ -98,7 +139,7 @@ .Sh EXIT STATUS .Ex -std .Sh EXAMPLES -.Dl "ipfwpcap -r 8091 divt.log &" +.Dl "ipfwpcap -r 8091 divt.log" .Pp Starts .Nm @@ -117,12 +158,13 @@ .Xr tcpdump 1 , .Xr pcap 3 , .Xr divert 4 , -.Xr ipfw 8 +.Xr ipfw 8 , +.Xr pflogd 8 .Sh HISTORY The .Nm utility first appeared in -.Fx 7.0 . +.Fx 6.3 . .Sh AUTHORS .An -nosplit .Nm ------------GJeOyVu3xITaB9Flyz9pX0-- From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 06:09:07 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA7CA16A41A for ; Tue, 18 Dec 2007 06:09:07 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from swin.edu.au (gpo4.cc.swin.edu.au [136.186.1.224]) by mx1.freebsd.org (Postfix) with ESMTP id 395A913C455 for ; Tue, 18 Dec 2007 06:09:05 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from [136.186.229.95] (lstewart.caia.swin.edu.au [136.186.229.95]) by swin.edu.au (8.13.6.20060614/8.13.1) with ESMTP id lBI4sftq018598; Tue, 18 Dec 2007 15:54:42 +1100 Message-ID: <47675291.5070101@swin.edu.au> Date: Tue, 18 Dec 2007 15:54:41 +1100 From: Lawrence Stewart User-Agent: Thunderbird 1.5.0.9 (X11/20070123) MIME-Version: 1.0 To: "freebsd-net@freebsd.org" , Iccrg@cs.ucl.ac.uk, tmrg-interest@ICSI.Berkeley.EDU, end2end-interest@postel.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.1.9 X-Spam-Checker-Version: SpamAssassin 3.1.9 (2007-02-13) on gpo4.cc.swin.edu.au X-Mailman-Approved-At: Tue, 18 Dec 2007 12:21:44 +0000 Cc: James Healy , grenville armitage , David Malone , Randall Stewart , Fred Baker , Douglas Leith , Robert Shorten , Larry Dunn Subject: Modular/Pluggable TCP Congestion Control for FreeBSD X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 06:09:07 -0000 Hi all, We've been involved in a research project to implement and test an emerging TCP congestion control algorithm under FreeBSD. As a part of this, we've put together a patch for FreeBSD 7.0-BETA4 that modularises the congestion control code in the TCP stack. It allows for new congestion control algorithms to be developed as loadable kernel modules. This improves FreeBSD's usefulness as a TCP research platform and makes it easier to customise the stack for specific scenarios like high bandwidth, long delay paths. There is an accompanying technical report "Light-Weight Modular TCP Congestion Control for FreeBSD 7" [1] that covers the design, features, kernel interface and usage of the framework. Also on our website is a beta release of a module that implements the H-TCP[2] congestion control algorithm proposed by the Hamilton Institute. We believe that modular congestion control is a worthwhile addition to FreeBSD. We've performed significant internal testing and there are currently no known issues or regressions with the implementation compared to a 'vanilla' FreeBSD 7.0-BETA4 kernel. We would welcome further review and testing from the wider community in the hope of getting this patch folded into FreeBSD 8-CURRENT. SIFTR [3], our tool for monitoring FreeBSD kernel TCP connection state, has also received a minor update to v1.1.5, with the addition of 6 new, useful variables. All code and documentation is available on our website[3]. Cheers, Jim and Lawrence http://caia.swin.edu.au [1] http://caia.swin.edu.au/reports/071218A/CAIA-TR-071218A.pdf [2] http://www.hamilton.ie/net/htcp3.pdf [3] http://caia.swin.edu.au/urp/newtcp/tools.html From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 12:37:02 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5481216A419; Tue, 18 Dec 2007 12:37:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id D586C13C469; Tue, 18 Dec 2007 12:37:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBICapqd006882 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 23:36:53 +1100 Date: Tue, 18 Dec 2007 23:36:50 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Scott Long In-Reply-To: <47676E96.4030708@samsco.org> Message-ID: <20071218233644.U756@besplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 12:37:02 -0000 On Mon, 17 Dec 2007, Scott Long wrote: > Bruce Evans wrote: >> On Mon, 17 Dec 2007, David G Lawrence wrote: >> >>> One more comment on my last email... The patch that I included is not >>> meant as a real fix - it is just a bandaid. The real problem appears to >>> be that a very large number of vnodes (all of them?) are getting synced >>> (i.e. calling ffs_syncvnode()) every time. This should normally only >>> happen for dirty vnodes. I suspect that something is broken with this >>> check: >>> >>> if (vp->v_type == VNON || ((ip->i_flag & >>> (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && >>> vp->v_bufobj.bo_dirty.bv_cnt == 0)) { >>> VI_UNLOCK(vp); >>> continue; >>> } >> >> Isn't it just the O(N) algorithm with N quite large? Under ~5.2, on > Right, it's a non-optimal loop when N is very large, and that's a fairly > well understood problem. I think what DG was getting at, though, is > that this massive flush happens every time the syncer runs, which > doesn't seem correct. Sure, maybe you just rsynced 100,000 files 20 > seconds ago, so the upcoming flush is going to be expensive. But the > next flush 30 seconds after that shouldn't be just as expensive, yet it > appears to be so. I'm sure it doesn't cause many bogus flushes. iostat shows zero writes caused by calling this incessantly using "while :; do sync; done". > This is further supported by the original poster's > claim that it takes many hours of uptime before the problem becomes > noticeable. If vnodes are never truly getting cleaned, or never getting > their flags cleared so that this loop knows that they are clean, then > it's feasible that they'll accumulate over time, keep on getting flushed > every 30 seconds, keep on bogging down the loop, and so on. Using "find / >/dev/null" to grow the problem and make it bad after a few seconds of uptime, and profiling of a single sync(2) call to show that nothing much is done except the loop containing the above: under ~5.2, on a 2.2GHz A64 UP ini386 mode: after booting, with about 700 vnodes: % % cumulative self self total % time seconds seconds calls ns/call ns/call name % 30.8 0.000 0.000 0 100.00% mcount [4] % 14.9 0.001 0.000 0 100.00% mexitcount [5] % 5.5 0.001 0.000 0 100.00% cputime [16] % 5.0 0.001 0.000 6 13312 13312 vfs_msync [18] % 4.3 0.001 0.000 0 100.00% user [21] % 3.5 0.001 0.000 5 11321 11993 ffs_sync [23] after "find / >/dev/null" was stopped after saturating at 64000 vnodes (desiredvodes is 70240): % % cumulative self self total % time seconds seconds calls ns/call ns/call name % 50.7 0.008 0.008 5 1666427 1667246 ffs_sync [5] % 38.0 0.015 0.006 6 1041217 1041217 vfs_msync [6] % 3.1 0.015 0.001 0 100.00% mcount [7] % 1.5 0.015 0.000 0 100.00% mexitcount [8] % 0.6 0.015 0.000 0 100.00% cputime [22] % 0.6 0.016 0.000 34 2660 2660 generic_bcopy [24] % 0.5 0.016 0.000 0 100.00% user [26] vfs_msync() is a problem too. It uses an almost identical loop for the case where the vnode is not dirty (but has a different condition for being dirty). ffs_sync() is called 5 times because there are 5 ffs file systems mounted r/w. There is another ffs file system mounted r/o and that combined with a missing r/o optimization might give the extra call to vfs_msync(). With 64000 vnodes, the calls take 1-2 ms each. That is already quite a lot, and there are many calls. Each call only looks at vnodes under the mount point so the number of mounted file systems doesn't affect the total time much. ffs_sync() i taking 125 ns per vnode. That is a more than I would have expected. Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 12:40:03 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2137316A41A for ; Tue, 18 Dec 2007 12:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0368313C4DD for ; Tue, 18 Dec 2007 12:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBICe2ql023112 for ; Tue, 18 Dec 2007 12:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBICe2BM023111; Tue, 18 Dec 2007 12:40:02 GMT (envelope-from gnats) Date: Tue, 18 Dec 2007 12:40:02 GMT Message-Id: <200712181240.lBICe2BM023111@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: "Nagy Keve" Cc: Subject: Re: kern/112654: [pcn] Kernel panic upon if_pcn module load on a Netfinity 5000 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Nagy Keve List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 12:40:03 -0000 The following reply was made to PR kern/112654; it has been noted by GNATS. From: "Nagy Keve" To: bug-followup@freebsd.org Cc: chuzzwassa@gmail.com Subject: Re: kern/112654: [pcn] Kernel panic upon if_pcn module load on a Netfinity 5000 Date: Tue, 18 Dec 2007 13:30:31 +0100 Src synced to RELENG_6 today via CVSup. $NetBSD: nsphy.c,v 1.18 1999/07/14 23:57:36 thorpej Exp $ I still have v1.18 of nsphy.c dating back to 1999, unlike the version 1.28 = Andy seemed to have in FreeBSD 7 which shows a date of 2007. Inserted the suggested new line manually. Rebuilt the kernel, rebooted the = system. The good news: The if_pcn module loads without kernel panic even while the = RJ45 cable is connected. The bad news: The pcn interface can no longer recognize a connected cable. = It always shows: =09status: no carrier SUMMARY: UNSUITABLE PATCH - The issue is still open. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 12:55:13 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42D8C16A420 for ; Tue, 18 Dec 2007 12:55:13 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9224B13C478 for ; Tue, 18 Dec 2007 12:55:12 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 87173 invoked from network); 18 Dec 2007 12:23:20 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 18 Dec 2007 12:23:20 -0000 Message-ID: <4767C338.4070709@freebsd.org> Date: Tue, 18 Dec 2007 13:55:20 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.13 (Windows/20070809) MIME-Version: 1.0 To: Lawrence Stewart References: <47675291.5070101@swin.edu.au> In-Reply-To: <47675291.5070101@swin.edu.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Tue, 18 Dec 2007 13:00:18 +0000 Cc: James Healy , grenville armitage , "freebsd-net@freebsd.org" , Iccrg@cs.ucl.ac.uk, Randall Stewart , tmrg-interest@ICSI.Berkeley.EDU, end2end-interest@postel.org, David Malone , Douglas Leith , Robert Shorten , Larry Dunn , Fred Baker Subject: Re: Modular/Pluggable TCP Congestion Control for FreeBSD X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 12:55:13 -0000 Lawrence Stewart wrote: > Hi all, > > We've been involved in a research project to implement and test an > emerging TCP congestion control algorithm under FreeBSD. As a part of > this, we've put together a patch for FreeBSD 7.0-BETA4 that modularises > the congestion control code in the TCP stack. It allows for new > congestion control algorithms to be developed as loadable kernel modules. > > This improves FreeBSD's usefulness as a TCP research platform and makes > it easier to customise the stack for specific scenarios like high > bandwidth, long delay paths. > > There is an accompanying technical report "Light-Weight Modular > TCP Congestion Control for FreeBSD 7" [1] that covers the design, > features, kernel interface and usage of the framework. Also on our > website is > a beta release of a module that implements the H-TCP[2] congestion control > algorithm proposed by the Hamilton Institute. > > We believe that modular congestion control is a worthwhile addition to > FreeBSD. We've performed significant internal testing and there are > currently no known issues or regressions with the implementation > compared to a 'vanilla' FreeBSD 7.0-BETA4 kernel. We would welcome > further review and testing from the wider community in the hope of > getting this > patch folded into FreeBSD 8-CURRENT. > > SIFTR [3], our tool for monitoring FreeBSD kernel TCP connection state, > has also > received a minor update to v1.1.5, with the addition of 6 new, useful > variables. > > All code and documentation is available on our website[3]. I've started to completely overhaul tcp_input and tcp_output including separating out the congestion control. Actually it is similiar to the way you seem to do it. A quick glance at your patch shows a couple of style issues and a complete lack of locking. Let me get you a Perforce account so we can develop and complete this work together. I'll create a Perforce branch and import my code and work in progress. -- Andre From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 14:17:43 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC8CE16A419; Tue, 18 Dec 2007 14:17:43 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 716BE13C4F2; Tue, 18 Dec 2007 14:17:43 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIEHgwi017772; Tue, 18 Dec 2007 06:17:42 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIEHg1v017771; Tue, 18 Dec 2007 06:17:42 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 06:17:42 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071218141742.GS25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071218233644.U756@besplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 06:17:42 -0800 (PST) Cc: freebsd-net@FreeBSD.ORG, Scott Long , freebsd-stable@FreeBSD.ORG Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 14:17:43 -0000 > >Right, it's a non-optimal loop when N is very large, and that's a fairly > >well understood problem. I think what DG was getting at, though, is > >that this massive flush happens every time the syncer runs, which > >doesn't seem correct. Sure, maybe you just rsynced 100,000 files 20 > >seconds ago, so the upcoming flush is going to be expensive. But the > >next flush 30 seconds after that shouldn't be just as expensive, yet it > >appears to be so. > > I'm sure it doesn't cause many bogus flushes. iostat shows zero writes > caused by calling this incessantly using "while :; do sync; done". I didn't say it caused any bogus disk I/O. My original problem (after a day or two of uptime) was an occasional large scheduling delay for a process that needed to process VoIP frames in real-time. It was happening every 31 seconds and was causing voice frames to be dropped due to the large latency causing the frame to be outside of the jitter window. I wrote a program that measures the scheduling delay by sleeping for one tick and then comparing the timeofday offset from what was expected. This revealed that every 31 seconds, the process was seeing a 17ms delay in scheduling. Further investigation found that 1) the syncer was the process that was running every 31 seconds and causing the delay (and it was the only one in the system with that timing interval), and that 2) lowering the kern.maxvnodes to something lowish (5000) would mostly mitigate the problem. The patch to limit the number of vnodes to process in the loop before sleeping was then developed and it completely resolved the problem. Since the wait that I added is at the bottom of the loop and the limit is 500 vnodes, this tells me that every 31 seconds, there are a whole lot of vnodes that are being "synced", when there shouldn't have been any (this fact wasn't apparent to me at the time, but when I later realized this, I had no time to investigate further). My tests and analysis have all been on an otherwise quiet system (no disk I/O), so the bottom of the ffs_sync vnode loop should not have been reached at all, let alone tens of thousands of times every 31 seconds. All machines were uni- processor, FreeBSD 6+. I don't know if this problem is present in 5.2. I didn't see ffs_syncvnode in your call graph, so it probably is not. Anyway, someone needs to instrument the vnode loop in ffs_sync and figure out what is going on. As you've pointed out, it is necessary to first read a lot of files (I use tar to /dev/null and make sure it reads at least 100K files) in order to get the vnodes allocated. As I mentioned previously, I suspect that either ip->i_flag is not getting completely cleared in ffs_syncvnode or its children or v_bufobj.bo_dirty.bv_cnt accounting is broken. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 14:19:14 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD2CC16A477 for ; Tue, 18 Dec 2007 14:19:14 +0000 (UTC) (envelope-from rea-fbsd@codelabs.ru) Received: from pobox.codelabs.ru (pobox.codelabs.ru [144.206.177.45]) by mx1.freebsd.org (Postfix) with ESMTP id 70D2C13C4FD for ; Tue, 18 Dec 2007 14:19:14 +0000 (UTC) (envelope-from rea-fbsd@codelabs.ru) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=one; d=codelabs.ru; h=Received:Date:From:To:Cc:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:Sender:X-Spam-Status:Subject; b=qmdkhdh4Of0GtKjT5yjpaSj7cBt/BL2p8/msE4dBZQ90rYVcpKusqiULkwf+S9RHzYtA3s2bnHbSfHrqHkI1gT1hp8aeVKll2S7TQTuJmAREt02mF9GTgColudtNVWL3eckpFt9/vy/Ee69b5SG+6PLriryFN8f75TsLEFi01Ak=; Received: from void.codelabs.ru (void.codelabs.ru [144.206.177.25]) by pobox.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256) id 1J4dHt-000GRh-1B; Tue, 18 Dec 2007 17:19:13 +0300 Date: Tue, 18 Dec 2007 17:19:11 +0300 From: Eygene Ryabinkin To: vermaden Message-ID: References: <20071214105845.E873D45B819@f49.poczta.interia.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20071214105845.E873D45B819@f49.poczta.interia.pl> Sender: rea-fbsd@codelabs.ru X-Spam-Status: No, score=-1.9 required=4.0 tests=ALL_TRUSTED,AWL,BAYES_50 Cc: freebsd-net@freebsd.org Subject: Re: default route X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 14:19:14 -0000 Good day. Fri, Dec 14, 2007 at 11:58:45AM +0100, vermaden wrote: > > Fri, Dec 14, 2007 at 11:20:32AM +0100, vermaden wrote: > > > I already used tcpdump, if ICMP packet goes in thru 192.168/16 on rl1 > > the > > > response goes out on 10/24 on rl0. Fri, Dec 14, 2007 at 11:58:45AM +0100, vermaden wrote: > zenek# tcpdump -lvvni rl1 icmp > When I ping 169.254.169.171 (my FreeBSD box) from 169.254.169.24 (Linux) I get this: > > zenek# tcpdump -lvvni rl1 icmp [...] > For Both FreBSD --> Linux ping and Linux --> FreeBSD ping the tcpdump -lvvni rl1 arp > and tcpdump -lvvni rl1 icmp commands does not show any packets. ^^^ Since you had showed output from 'rl1', maybe you meant 'rl0' here? [...] > tcpdump on rl0 still nothing. After reading this I feel that you have absolutely no packets on either interfaces when your Linux box ping FreeBSD. But this contradicts with your previous assertion that if ICMP packet comes in on rl1, then it is reflected at rl0. Am I missing something? -- Eygene From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 14:41:35 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05D5F16A418; Tue, 18 Dec 2007 14:41:35 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id C345413C45B; Tue, 18 Dec 2007 14:41:34 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIEfXxC033146; Tue, 18 Dec 2007 06:41:33 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIEfXlY033145; Tue, 18 Dec 2007 06:41:33 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 06:41:33 -0800 From: David G Lawrence To: Mark Fullmer Message-ID: <20071218144133.GT25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 06:41:33 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 14:41:35 -0000 > Thanks. Have a kernel building now. It takes about a day of uptime > after reboot before I'll see the problem. You may also wish to try to get the problem to occur sooner after boot on a non-patched system by doing a "tar cf /dev/null /" (note: substitute /dev/zero instead of /dev/null, if you use GNU tar, to disable its "optimization"). You can stop it after it has gone through a 100K files. Verify by looking at "sysctl vfs.numvnodes". Doing this would help to further prove that lots of allocated vnodes is the prerequisite for the problem. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 15:19:38 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A31316A47B; Tue, 18 Dec 2007 15:19:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id AC6CF13C501; Tue, 18 Dec 2007 15:19:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBIFJSa2029561 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Dec 2007 02:19:31 +1100 Date: Wed, 19 Dec 2007 02:19:28 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071218144133.GT25053@tnn.dglawrence.com> Message-ID: <20071219020952.A34422@delplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> <20071218144133.GT25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 15:19:38 -0000 On Tue, 18 Dec 2007, David G Lawrence wrote: >> Thanks. Have a kernel building now. It takes about a day of uptime >> after reboot before I'll see the problem. > > You may also wish to try to get the problem to occur sooner after boot > on a non-patched system by doing a "tar cf /dev/null /" (note: substitute > /dev/zero instead of /dev/null, if you use GNU tar, to disable its > "optimization"). You can stop it after it has gone through a 100K files. > Verify by looking at "sysctl vfs.numvnodes". Hmm, I said to use "find /", but that is not so good since it only looks at directories and directories (and their inodes) are not packed as tightly as files (and their inodes). Optimized tar, or "find / -type f", or "ls -lR /", should work best, by doing not much more than stat()ing lots of files, while full tar wastes time reading file data. Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 15:22:19 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 607B416A420; Tue, 18 Dec 2007 15:22:19 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 25E2A13C4E8; Tue, 18 Dec 2007 15:22:19 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIFMI5p059376; Tue, 18 Dec 2007 07:22:18 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIFMIUi059375; Tue, 18 Dec 2007 07:22:18 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 07:22:18 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071218152217.GU25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <089C1524-B8E0-4C70-B69A-ECBE0C8DFC90@eng.oar.net> <20071218144133.GT25053@tnn.dglawrence.com> <20071219020952.A34422@delplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071219020952.A34422@delplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 07:22:18 -0800 (PST) Cc: freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 15:22:20 -0000 > On Tue, 18 Dec 2007, David G Lawrence wrote: > > >>Thanks. Have a kernel building now. It takes about a day of uptime > >>after reboot before I'll see the problem. > > > > You may also wish to try to get the problem to occur sooner after boot > >on a non-patched system by doing a "tar cf /dev/null /" (note: substitute > >/dev/zero instead of /dev/null, if you use GNU tar, to disable its > >"optimization"). You can stop it after it has gone through a 100K files. > >Verify by looking at "sysctl vfs.numvnodes". > > Hmm, I said to use "find /", but that is not so good since it only > looks at directories and directories (and their inodes) are not packed > as tightly as files (and their inodes). Optimized tar, or "find / > -type f", or "ls -lR /", should work best, by doing not much more than > stat()ing lots of files, while full tar wastes time reading file data. I have no reason to believe that just reading directories will reproduce the problem with file vnodes. You need to open the files and read them. Nothing else will do. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 15:25:09 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E63A16A420 for ; Tue, 18 Dec 2007 15:25:09 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: from fk-out-0910.google.com (fk-out-0910.google.com [209.85.128.191]) by mx1.freebsd.org (Postfix) with ESMTP id C02A813C4CC for ; Tue, 18 Dec 2007 15:25:08 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: by fk-out-0910.google.com with SMTP id b27so2878157fka.11 for ; Tue, 18 Dec 2007 07:25:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:message-id:from:to:cc:subject:in-reply-to:references:user-agent:x-cite-me:mime-version:content-type:sender; bh=yA2enVgHlU9452o7I76LQRYmPQp79E0vtXQXBEyGFz0=; b=DTiVGNExpuZiSOz8kuaGn9eSnIZdc3XHni5KqejOQuMJNp/g+ZOzo2zAGszqteFB975dAI7Kbdpw9fqz36PqTObkNf3fF1b0Zy+UAjfzGBoF5TP36OaNU/GWqIvE0HgzTnJImCseTqHHIXI9Y8C+pCGD15F4iMQT9/fzV2Nq9C0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:message-id:from:to:cc:subject:in-reply-to:references:user-agent:x-cite-me:mime-version:content-type:sender; b=u0qsELBaZjnUUqh0l4FSgqcJP7pfKOjARxx2jcd78eDPVeX3YOZbCYbRTRpjcB9SVDjCyDUtEI/H7U2rv7EiI+YRwdrEzM/pvZid0n560ZaTGTSHk6JO9o1jDCS/sZQh2l55XkKENAxbEsWGBgcD4XU6CG2gIFwHopiwhXhXTH4= Received: by 10.82.107.15 with SMTP id f15mr2969788buc.0.1197991506787; Tue, 18 Dec 2007 07:25:06 -0800 (PST) Received: from epsilon.local.gmail.com ( [193.126.201.240]) by mx.google.com with ESMTPS id m8sm6291900gvf.2007.12.18.07.25.05 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 18 Dec 2007 07:25:06 -0800 (PST) Date: Tue, 18 Dec 2007 15:24:40 +0000 Message-ID: <86sl20nhzb.wl%rpaulo@fnop.net> From: Rui Paulo To: Andre Oppermann In-Reply-To: <4767C338.4070709@freebsd.org> References: <47675291.5070101@swin.edu.au> <4767C338.4070709@freebsd.org> User-Agent: Wanderlust/2.15.5 (Almost Unreal) Emacs/21.3 Mule/5.0 (SAKAKI) X-cite-me: rpaulo MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: Rui Cc: "freebsd-net@freebsd.org" Subject: Re: Modular/Pluggable TCP Congestion Control for FreeBSD X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 15:25:09 -0000 Rui Paulo At Tue, 18 Dec 2007 13:55:20 +0100, Andre Oppermann wrote: > > Lawrence Stewart wrote: > > Hi all, > > We've been involved in a research project to implement and test an > > emerging TCP congestion control algorithm under FreeBSD. As a part of > > this, we've put together a patch for FreeBSD 7.0-BETA4 that modularises > > the congestion control code in the TCP stack. It allows for new > > congestion control algorithms to be developed as loadable kernel modules. > > This improves FreeBSD's usefulness as a TCP research platform and > > makes > > it easier to customise the stack for specific scenarios like high > > bandwidth, long delay paths. > > There is an accompanying technical report "Light-Weight Modular > > TCP Congestion Control for FreeBSD 7" [1] that covers the design, > > features, kernel interface and usage of the framework. Also on our > > website is > > a beta release of a module that implements the H-TCP[2] congestion control > > algorithm proposed by the Hamilton Institute. > > We believe that modular congestion control is a worthwhile addition > > to > > FreeBSD. We've performed significant internal testing and there are > > currently no known issues or regressions with the implementation > > compared to a 'vanilla' FreeBSD 7.0-BETA4 kernel. We would welcome > > further review and testing from the wider community in the hope of > > getting this > > patch folded into FreeBSD 8-CURRENT. > > SIFTR [3], our tool for monitoring FreeBSD kernel TCP connection > > state, has also > > received a minor update to v1.1.5, with the addition of 6 new, > > useful variables. > > All code and documentation is available on our website[3]. > > I've started to completely overhaul tcp_input and tcp_output > including separating out the congestion control. Actually it > is similiar to the way you seem to do it. > > A quick glance at your patch shows a couple of style issues > and a complete lack of locking. > > Let me get you a Perforce account so we can develop and complete > this work together. I'll create a Perforce branch and import my > code and work in progress. Andre, perhaps you could keep us, committers, more up to date with your projects? I've been asked to port my NetBSD tcp_congctl(9) API to FreeBSD at some point in time and I wasn't aware that you were working on the same thing. Perhaps you could create a branch for you in p4 so we could follow your work? Thanks. -- Rui Paulo From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 16:20:40 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8861C16A46B; Tue, 18 Dec 2007 16:20:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by mx1.freebsd.org (Postfix) with ESMTP id 10CCA13C4D1; Tue, 18 Dec 2007 16:20:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBIGKSE0008461 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Dec 2007 03:20:31 +1100 Date: Wed, 19 Dec 2007 03:20:28 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071218141742.GS25053@tnn.dglawrence.com> Message-ID: <20071219022102.I34422@delplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org, Scott Long , freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 16:20:40 -0000 On Tue, 18 Dec 2007, David G Lawrence wrote: > I didn't say it caused any bogus disk I/O. My original problem > (after a day or two of uptime) was an occasional large scheduling delay > for a process that needed to process VoIP frames in real-time. It was > happening every 31 seconds and was causing voice frames to be dropped > due to the large latency causing the frame to be outside of the jitter > window. I wrote a program that measures the scheduling delay by sleeping > for one tick and then comparing the timeofday offset from what was > expected. This revealed that every 31 seconds, the process was seeing > a 17ms delay in scheduling. Further investigation found that 1) the I got an almost identical delay (with 64000 vnodes). Now, 17ms isn't much. Delays much have been much longer when CPUs were many times slower and RAM/vnodes were not so many times smaller. High-priority threads just need to be able to preempt the syncer so that they don't lose data (unless really hard real time is supported, which it isn't). This should work starting with about FreeBSD-6 (probably need "options PREEMPT"). I doesn't work in ~5.2 due to Giant locking, but I find Giant locking to rarely matter for UP. Old versions of FreeBSD were only able to preempt to non-threads (interrupt handlers) yet they somehow survived the longer delays. They didn't have Giant locking to get in the way, and presumably avoided packet loss by doing lots in interrupt handlers (hardware isr and netisr). I just remembered that I have seen packet loss even under -current when I leave out or turn off "options PREEMPT". > ... > and it completely resolved the problem. Since the wait that I added > is at the bottom of the loop and the limit is 500 vnodes, this tells > me that every 31 seconds, there are a whole lot of vnodes that are > being "synced", when there shouldn't have been any (this fact wasn't > apparent to me at the time, but when I later realized this, I had > no time to investigate further). My tests and analysis have all been > on an otherwise quiet system (no disk I/O), so the bottom of the > ffs_sync vnode loop should not have been reached at all, let alone > tens of thousands of times every 31 seconds. All machines were uni- > processor, FreeBSD 6+. I don't know if this problem is present in 5.2. > I didn't see ffs_syncvnode in your call graph, so it probably is not. I chopped to a float profile with only top callers. Any significant calls from ffs_sync() would show up as top callers. I still have the data, and the call graph shows much more clearly that there was just one dirty vnode for the whole sync(): % 0.00 0.01 1/1 syscall [3] % [4] 88.7 0.00 0.01 1 sync [4] % 0.01 0.00 5/5 ffs_sync [5] % 0.01 0.00 6/6 vfs_msync [6] % 0.00 0.00 7/8 vfs_busy [260] % 0.00 0.00 7/8 vfs_unbusy [263] % 0.00 0.00 6/7 vn_finished_write [310] % 0.00 0.00 6/6 vn_start_write [413] % 0.00 0.00 1/1 vfs_stdnosync [472] % % ----------------------------------------------- % % 0.01 0.00 5/5 sync [4] % [5] 50.7 0.01 0.00 5 ffs_sync [5] % 0.00 0.00 1/1 ffs_fsync [278] % 0.00 0.00 1/60 vget [223] % 0.00 0.00 1/60 ufs_vnoperatespec [78] % 0.00 0.00 1/26 vrele [76] It passed the flags test just once to get to the vget(). ffs_syncvnode() doesn't exist in 5.2, and ffs_fsync() is called instead. % % ----------------------------------------------- % % 0.01 0.00 6/6 sync [4] % [6] 38.0 0.01 0.00 6 vfs_msync [6] % % ----------------------------------------------- % ... % % 0.00 0.00 1/1 ffs_sync [5] % [278] 0.0 0.00 0.00 1 ffs_fsync [278] % 0.00 0.00 1/1 ffs_update [368] % 0.00 0.00 1/4 vn_isdisk [304] This is presumbly to sync the 1 dirty vnode. BTW I use noatime a lot, including for all file systems used in the test, so the tree walk didn't dirty any vnodes. A tar to /dev/zero would dirty all vnodes if everything were mounted without this option. % ... % % cumulative self self total % time seconds seconds calls ns/call ns/call name % 50.7 0.008 0.008 5 1666427 1667246 ffs_sync [5] % 38.0 0.015 0.006 6 1041217 1041217 vfs_msync [6] % 3.1 0.015 0.001 0 100.00% mcount [7] % 1.5 0.015 0.000 0 100.00% mexitcount [8] % 0.6 0.015 0.000 0 100.00% cputime [22] % 0.6 0.016 0.000 34 2660 2660 generic_bcopy [24] % 0.5 0.016 0.000 0 100.00% user [26] Bruce From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 16:57:33 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C69C16A417; Tue, 18 Dec 2007 16:57:33 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 20E1213C459; Tue, 18 Dec 2007 16:57:32 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIGvWsD020266; Tue, 18 Dec 2007 08:57:32 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIGvWU6020265; Tue, 18 Dec 2007 08:57:32 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 08:57:32 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071218165732.GV25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071219022102.I34422@delplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 08:57:32 -0800 (PST) Cc: freebsd-net@FreeBSD.org, Scott Long , freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 16:57:33 -0000 > I got an almost identical delay (with 64000 vnodes). > > Now, 17ms isn't much. Says you. On modern systems, trying to run a pseudo real-time application on an otherwise quiescent system, 17ms is just short of an eternity. I agree that the syncer should be preemptable (which is what my bandaid patch attempts to do), but that probably wouldn't have helped my specific problem since my application was a user process, not a kernel thread. All of my systems have options PREEMPTION - that is the default in 6+. It doesn't affect this problem. On the other hand, the syncer shouldn't be consuming this much CPU in the first place. There is obviously a bug here. Of course looking through all of the vnodes in the system for something dirty is stupid in the first place; there should be a seperate list for that. ...but a simple fix is what is needed right now. I'm going to have to bow out of this discussion now. I just don't have the time for it. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 17:20:58 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8285416A418 for ; Tue, 18 Dec 2007 17:20:58 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: from smtp4.poczta.interia.pl (smtp35.poczta.interia.pl [80.48.65.35]) by mx1.freebsd.org (Postfix) with ESMTP id 2F22213C44B for ; Tue, 18 Dec 2007 17:20:57 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: by smtp4.poczta.interia.pl (INTERIA.PL, from userid 502) id E354450606F; Tue, 18 Dec 2007 18:20:56 +0100 (CET) Received: from f32.poczta.interia.pl (f32.poczta.interia.pl [10.217.2.32]) by smtp4.poczta.interia.pl (INTERIA.PL) with ESMTP id 50B69505F6E; Tue, 18 Dec 2007 18:20:56 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by f32.poczta.interia.pl (Postfix) with ESMTP id D14CC160055; Tue, 18 Dec 2007 18:20:53 +0100 (CET) Date: 18 Dec 2007 18:20:53 +0100 From: vermaden To: Eygene Ryabinkin MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=ISO-8859-2 Content-Transfer-Encoding: QUOTED-PRINTABLE X-ORIGINATE-IP: 85.89.167.26 X-Mailer: PSE Message-Id: <20071218172055.D14CC160055@f32.poczta.interia.pl> X-EMID: b7f40acc Cc: freebsd-net@freebsd.org Subject: Re: default route X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 17:20:58 -0000 > > tcpdump on rl0 still nothing. >=20 > After reading this I feel that you have absolutely no packets on > either interfaces when your Linux box ping FreeBSD. But this > contradicts with your previous assertion that if ICMP packet comes > in on rl1, then it is reflected at rl0. Am I missing something? > --=20 > Eygene Yes I must mislook that, rl0 also is 'dead' while Linux box pings my FreeBS= D box using net on rl1. Regards vermaden ---------------------------------------------------------------------- Upieksz swoj Windows! Kliknij >>> http://link.interia.pl/f1cb1 From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 18:05:55 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF9A216A418; Tue, 18 Dec 2007 18:05:55 +0000 (UTC) (envelope-from sam@errno.com) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id 923FF13C465; Tue, 18 Dec 2007 18:05:50 +0000 (UTC) (envelope-from sam@errno.com) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id lBIHddW3010769 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Dec 2007 09:39:39 -0800 (PST) (envelope-from sam@errno.com) Message-ID: <476805DB.6020408@errno.com> Date: Tue, 18 Dec 2007 09:39:39 -0800 From: Sam Leffler User-Agent: Thunderbird 2.0.0.9 (X11/20071125) MIME-Version: 1.0 To: "Andrey V. Elsukov" References: <8c1eada80712170043w216b36b7gb5de6a149b952604@mail.gmail.com> <476673A6.8050306@yandex.ru> In-Reply-To: <476673A6.8050306@yandex.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-Rhyolite-Metrics: o.com; whitelist Cc: freebsd-net@freebsd.org, Krishna Kumar , freebsd-drivers@freebsd.org Subject: Re: WOL suport in Broadcom 5721 (57XX) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 18:05:56 -0000 Andrey V. Elsukov wrote: > Krishna Kumar wrote: >> Is this in the list of todo's?? >> Can this feature not be supported due to design issues? >> Is somebody trying this out somewhere? >> Please do copy me on the reply as I am not subscribed to the list. > > Look into freebsd-hackers@ mail archive. In the previous month there > was a discussion about WOL support. Look to topics: > 1. FreeBSD WOL sis on > 2. How to add wake on lan support for your card > > And as i remember, Sam Leffer has made some work for WOL support. > All I did was add the trivial ifconfig knobs. Stefan Sperling was doing the heavy lifting of making the various drivers WOL aware. Jack Vogel indicated em already has the necessary support in the driver so hooking it up shouldn't be a big deal. Broadcom devices are another matter. Sam From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 18:10:24 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4388916A41A; Tue, 18 Dec 2007 18:10:24 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 083A813C4E8; Tue, 18 Dec 2007 18:10:23 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBIIANZd067229; Tue, 18 Dec 2007 10:10:23 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBIIANK3067228; Tue, 18 Dec 2007 10:10:23 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Tue, 18 Dec 2007 10:10:23 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071218181023.GW25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071218165732.GV25053@tnn.dglawrence.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Tue, 18 Dec 2007 10:10:23 -0800 (PST) Cc: freebsd-net@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 18:10:24 -0000 > > I got an almost identical delay (with 64000 vnodes). > > > > Now, 17ms isn't much. > > Says you. On modern systems, trying to run a pseudo real-time application > on an otherwise quiescent system, 17ms is just short of an eternity. I agree > that the syncer should be preemptable (which is what my bandaid patch > attempts to do), but that probably wouldn't have helped my specific problem > since my application was a user process, not a kernel thread. One more followup (I swear I'm done, really!)... I have a laptop here that runs at 150MHz when it is in the lowest running CPU power save mode. At that speed, this bug causes a delay of more than 300ms and is enough to cause loss of keyboard input. I have to switch into high speed mode before I try to type anything, else I end up with random typos. Very annoying. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 19:21:29 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F36A716A419; Tue, 18 Dec 2007 19:21:28 +0000 (UTC) (envelope-from fred@cisco.com) Received: from sj-iport-2.cisco.com (sj-iport-2-in.cisco.com [171.71.176.71]) by mx1.freebsd.org (Postfix) with ESMTP id B57DF13C44B; Tue, 18 Dec 2007 19:21:28 +0000 (UTC) (envelope-from fred@cisco.com) Received: from sj-dkim-2.cisco.com ([171.71.179.186]) by sj-iport-2.cisco.com with ESMTP; 18 Dec 2007 10:53:41 -0800 Received: from sj-core-1.cisco.com (sj-core-1.cisco.com [171.71.177.237]) by sj-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id lBIIrfcu032120; Tue, 18 Dec 2007 10:53:41 -0800 Received: from xbh-sjc-211.amer.cisco.com (xbh-sjc-211.cisco.com [171.70.151.144]) by sj-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id lBIIrC43016030; Tue, 18 Dec 2007 18:53:20 GMT Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 18 Dec 2007 10:53:19 -0800 Received: from [10.32.244.220] ([10.32.244.220]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 18 Dec 2007 10:53:18 -0800 In-Reply-To: <4767C338.4070709@freebsd.org> References: <47675291.5070101@swin.edu.au> <4767C338.4070709@freebsd.org> Mime-Version: 1.0 (Apple Message framework v752.3) X-Gpgmail-State: !signed Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <7B13FFEE-F69D-4EBA-B734-356D8E63FDC7@cisco.com> Content-Transfer-Encoding: 7bit From: Fred Baker Date: Tue, 18 Dec 2007 10:53:18 -0800 To: Lawrence Stewart , Andre Oppermann , James Healy X-Mailer: Apple Mail (2.752.3) X-OriginalArrivalTime: 18 Dec 2007 18:53:18.0856 (UTC) FILETIME=[41284080:01C841A7] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=2180; t=1198004021; x=1198868021; c=relaxed/simple; s=sjdkim2002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=fred@cisco.com; z=From:=20Fred=20Baker=20 |Subject:=20Re=3A=20Modular/Pluggable=20TCP=20Congestion=20 Control=20for=20FreeBSD |Sender:=20; bh=PrIDNDY5OOtY96ZdDKlXD+LYA1mBGPx6zMFQfBB4M28=; b=c1ZdDhmlZgPbfNtHDkMmW3oByPaNB5QblYWwejzh5XG1lV4VVT2cKwyp/9 E6+YEAEVaxP1Fc5tMJqzgUOcfCwyHfHn2O/KsXG1yFzXuisFunFcpZxbPl0X t/4U+JwarM; Authentication-Results: sj-dkim-2; header.From=fred@cisco.com; dkim=pass ( sig from cisco.com/sjdkim2002 verified; ); X-Mailman-Approved-At: Tue, 18 Dec 2007 21:08:26 +0000 Cc: grenville armitage , freebsd-net@freebsd.org, Iccrg@cs.ucl.ac.uk, Randall Stewart , tmrg-interest@ICSI.Berkeley.EDU, end2end-interest@postel.org, David Malone , Douglas Leith , Robert Shorten , Larry Dunn Subject: Re: Modular/Pluggable TCP Congestion Control for FreeBSD X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 19:21:29 -0000 Thanks to each of you. On Dec 18, 2007, at 4:55 AM, Andre Oppermann wrote: > Lawrence Stewart wrote: >> Hi all, >> We've been involved in a research project to implement and test an >> emerging TCP congestion control algorithm under FreeBSD. As a part of >> this, we've put together a patch for FreeBSD 7.0-BETA4 that >> modularises >> the congestion control code in the TCP stack. It allows for new >> congestion control algorithms to be developed as loadable kernel >> modules. >> This improves FreeBSD's usefulness as a TCP research platform and >> makes >> it easier to customise the stack for specific scenarios like high >> bandwidth, long delay paths. >> There is an accompanying technical report "Light-Weight Modular >> TCP Congestion Control for FreeBSD 7" [1] that covers the design, >> features, kernel interface and usage of the framework. Also on our >> website is >> a beta release of a module that implements the H-TCP[2] congestion >> control >> algorithm proposed by the Hamilton Institute. >> We believe that modular congestion control is a worthwhile >> addition to >> FreeBSD. We've performed significant internal testing and there are >> currently no known issues or regressions with the implementation >> compared to a 'vanilla' FreeBSD 7.0-BETA4 kernel. We would welcome >> further review and testing from the wider community in the hope of >> getting this >> patch folded into FreeBSD 8-CURRENT. >> SIFTR [3], our tool for monitoring FreeBSD kernel TCP connection >> state, has also >> received a minor update to v1.1.5, with the addition of 6 new, >> useful variables. >> All code and documentation is available on our website[3]. > > I've started to completely overhaul tcp_input and tcp_output > including separating out the congestion control. Actually it > is similiar to the way you seem to do it. > > A quick glance at your patch shows a couple of style issues > and a complete lack of locking. > > Let me get you a Perforce account so we can develop and complete > this work together. I'll create a Perforce branch and import my > code and work in progress. > > -- > Andre From owner-freebsd-net@FreeBSD.ORG Tue Dec 18 21:49:31 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F23016A41A for ; Tue, 18 Dec 2007 21:49:31 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 3741013C4EA for ; Tue, 18 Dec 2007 21:49:31 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 84735 invoked from network); 18 Dec 2007 21:49:30 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 18 Dec 2007 21:49:30 -0000 In-Reply-To: <20071217102433.GQ25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Tue, 18 Dec 2007 16:49:14 -0500 To: David G Lawrence X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2007 21:49:31 -0000 A little progress. I have a machine with a KTR enabled kernel running. Another machine is running David's ffs_vfsops.c's patch. I left two other machines (GENERIC kernels) running the packet loss test overnight. At ~ 32480 seconds of uptime the problem starts. This is really close to a 16 bit overflow... See http://www.eng.oar.net/~maf/bsd6/ p1.png and http://www.eng.oar.net/~maf/bsd6/p2.png. The missing impulses at 31 second marks are the intervals between test runs. The window of missing packets (timestamps between two packets where a sequence number is missing) is usually less than 4us, altough I'm not sure gettimeofday() can be trusted for measuring this. See https://www.eng.oar.net/~maf/bsd6/ p3.png Things I'll try tonight: o check on the patched kernel o Try KTR debugging enabled before and after an expected high latency period. o Dump all files to /dev/null to trigger the behavior. I would expect the vnode problem to look a little different on the packet loss graphs over time. If this leads anywher I'll add a counter before the msleep() and see how often it's getting there. On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote: > I noticed this as well some time ago. The problem has to do with > the > processing (syncing) of vnodes. When the total number of allocated > vnodes > in the system grows to tens of thousands, the ~31 second periodic sync > process takes a long time to run. Try this patch and let people > know if > it helps your problem. It will periodically wait for one tick (1ms) > every > 500 vnodes of processing, which will allow other things to run. > > Index: ufs/ffs/ffs_vfsops.c > =================================================================== > RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v > retrieving revision 1.290.2.16 > diff -c -r1.290.2.16 ffs_vfsops.c > *** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16 > --- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000 > *************** > *** 1109,1114 **** > --- 1109,1115 ---- > int softdep_deps; > int softdep_accdeps; > struct bufobj *bo; > + int flushed_count = 0; > > fs = ump->um_fs; > if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */ > *************** > *** 1174,1179 **** > --- 1175,1184 ---- > allerror = error; > vput(vp); > MNT_ILOCK(mp); > + if (flushed_count++ > 500) { > + flushed_count = 0; > + msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1); > + } > } > MNT_IUNLOCK(mp); > /* > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 12:09:49 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39A1816A41A; Wed, 19 Dec 2007 12:09:49 +0000 (UTC) (envelope-from mux@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1E8B913C457; Wed, 19 Dec 2007 12:09:49 +0000 (UTC) (envelope-from mux@freebsd.org) Received: by elvis.mu.org (Postfix, from userid 1920) id 8FC3A1A4D86; Wed, 19 Dec 2007 04:08:31 -0800 (PST) Date: Wed, 19 Dec 2007 13:08:31 +0100 From: Maxime Henrion To: net@FreeBSD.org Message-ID: <20071219120831.GN71713@elvis.mu.org> References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="jCrbxBqMcLqd4mOl" Content-Disposition: inline In-Reply-To: <20071217101009.GL71713@elvis.mu.org> User-Agent: Mutt/1.4.2.3i Cc: Gleb Smirnoff , Julian Elischer Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 12:09:49 -0000 --jCrbxBqMcLqd4mOl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Maxime Henrion wrote: > Julian Elischer wrote: > > Gleb Smirnoff wrote: > > >On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: > > >J> Maxime Henrion wrote: > > >J> > Replying to myself on this one, sorry about that. > > >J> > I said in my previous mail that I didn't know yet what process was > > >J> > holding the lock of the rtentry that the routed process is dealing > > >J> > with in rt_setgate(), and I just could verify that it is held by > > >J> > the swi1: net thread. > > >J> > So, in a nutshell: > > >J> > - The routed process does its business on the routing socket, that > > >ends up > > >J> > calling rt_setgate(). While in rt_setgate() it drops the lock on > > >its > > >J> > rtentry in order to call rtalloc1(). At this point, the routed > > >J> > process hold the gateway route (rtalloc1() returns it locked), and > > >it > > >J> > now tries to re-lock the original rtentry. > > >J> > - At the same time, the swi net thread calls arpresolve() which ends > > >up > > >J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to > > >J> > lock the gateway route. > > >J> > A classical case of deadlock with mutexes because of different locking > > >J> > order. Now, it's not obvious to me how to fix it :-). > > >J> > > >J> On failure to re-lock, the routed call to rt_setgate should completely > > >abort J> and restart from scratch, releasing all locks it has on the way > > >out. > > > > > >Do you suggest mtx_trylock? > > > > I think that would be the cleanest way.. > > So, here's what I've got. I have yet to test it at all, I hope that > I'll be able to do so today, or tomorrow. Any input appreciated. It appears that this patch fixed the problem. My gateway server now has a nearly two days uptime, whereas previously it would have probably crashed already. I'm attaching the final version of the patch here, since the last one had build-time errors. I'm going to commit this in HEAD soon unless someone has an objection for it. Cheers, Maxime --jCrbxBqMcLqd4mOl Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="rt_setgate.patch" --- route.h.orig Tue Apr 4 22:07:23 2006 +++ route.h Mon Dec 17 13:11:44 2007 @@ -289,6 +289,7 @@ #define RT_LOCK_INIT(_rt) \ mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK) #define RT_LOCK(_rt) mtx_lock(&(_rt)->rt_mtx) +#define RT_TRYLOCK(_rt) mtx_trylock(&(_rt)->rt_mtx) #define RT_UNLOCK(_rt) mtx_unlock(&(_rt)->rt_mtx) #define RT_LOCK_DESTROY(_rt) mtx_destroy(&(_rt)->rt_mtx) #define RT_LOCK_ASSERT(_rt) mtx_assert(&(_rt)->rt_mtx, MA_OWNED) --- route.c.orig Tue Oct 30 19:07:54 2007 +++ route.c Mon Dec 17 15:13:20 2007 @@ -996,6 +996,7 @@ struct radix_node_head *rnh = rt_tables[dst->sa_family]; int dlen = SA_SIZE(dst), glen = SA_SIZE(gate); +again: RT_LOCK_ASSERT(rt); /* @@ -1029,7 +1030,15 @@ RT_REMREF(rt); return (EADDRINUSE); /* failure */ } - RT_LOCK(rt); + /* + * Try to reacquire the lock on rt, and if it fails, + * clean state and restart from scratch. + */ + if (!RT_TRYLOCK(rt)) { + RTFREE_LOCKED(gwrt); + RT_LOCK(rt); + goto again; + } /* * If there is already a gwroute, then drop it. If we * are asked to replace route with itself, then do --jCrbxBqMcLqd4mOl-- From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 13:24:52 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54E0216A421 for ; Wed, 19 Dec 2007 13:24:52 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 115CB13C459 for ; Wed, 19 Dec 2007 13:24:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 859A247E50; Wed, 19 Dec 2007 08:09:13 -0500 (EST) Date: Wed, 19 Dec 2007 13:09:13 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: net@FreeBSD.org Message-ID: <20071219123305.Y95322@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: James Healy , arch@FreeBSD.org, Lawrence Stewart Subject: Coordinating TCP projects X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 13:24:52 -0000 Dear all, It is rapidly becoming clear that quite a few of us have Big Plans for the TCP implementation over the next 12-18 months. It's important that we get the plans out on the table now so that everyone working on these projects is aware of the larger context. This will encourage collaboration, but also allow us to manage the risks inevitably associated with having several simultaneous projects going on in a very complex software base. With that in mind, here are the large projects I'm currently aware of: Project Flag Wavers Status ------- ----------- ------ TCP offload Kip Macy Moving to CVS and under review and testing; one supporting device driver. TCP congestion control Sam Leffler, At least one prototype Rui Paulo, implementation, to move to p4 Andre Oppermann, Kip Macy, Lawrence Stewart, James Healy TCP overhaul Andre Oppermann Glimmer in eye, to move to p4. TCP lock granularity/ Robert Watson Glimmer in eye, to occur in increased parallelism p4. TCP timer unification Andre Oppermann, Previously committed, and to Mike Silbersack be reintroduced via p4. Monitoring ABI cleanup Robert Watson Glimmer in eye, to occur in p4. Looking at the above, it sounds like a massive amount of work taking place, so we will need to coordinate carefully. I'd like to encourage people to avoid creating unnecessary dependencies between changes, and to be especially careful in coordinating potentially MFCable changes. There are (at least) two conflicting scheduling desires in play here: - A desire to merge MFCable changes early, so that they aren't entangled with un-mergeable changes. This will simplify merging and also maximize the extent to which testing in HEAD will apply to them once merged to RELENG_7. - A desire to merge large-scale infrastructural changes early so that they see the greatest exposure, and so that they can be introduced incrementally over a longer period of time to shake each out. Both of these are valid perspectives, and will need to be balanced. I have a few questions, then, for people involved in these or other projects: (0) Is your project in the above list? If not, could you send out a reply talking a bit about the project, who's involved, where it's taking place, etc. (1) What is your availability to shepherd the project through its entire cycle, including early prototyping, design review, development, implementation review, testing, and the inevitable long debugging tail that all TCP projects have. (2) When do you think your implementation will reach a prototype phase appropriate for an expanded circle of reviewers? When do you think it might be ready for commit? Keep in mind that we're now a month or so into the 18-month cycle for 8.0, and that all serious TCP work should be completed at least six months before the end of the cycle. (3) What potential interactions of note exist between your project and the others being planned. Are there explicit dependencies? (4) Do you anticipate an MFC cycle for your work to RELENG_7? I'd like for us to create a wiki page tracking these various projects, and pointing at per-project resources. Once the discussion has settled a bit, I can take responsibility for creating such a page, but will need everyone involved to help maintain it, as well as to maintain pages (on the wiki or elsewhere) regarding the status of the projects. I think it also makes a lot of sense for participants in the projects to send occasional updates and reports to net@/arch@ in order to keep people who can't track things day-to-date in the loop, and to invite review. At the end of the day, we must be clear: the only way even a fraction of these projects can happen in time for 8.0 is if there is careful planning, coordination, and exception care taken in the review and testing of the changes. We cannot have the 8.0 release cycle put at risk the way the 7.0 cycle was due to inadequately reviwed and tested patches entering the tree under the assumption that problems would somehow be magically found and fixed before the release by the relatively small population of -CURRENT users. Experience tells us that changes must be extensively reviewed and tested before they enter the tree. I'm really looking forward to the 8 development cycle, and the work that's in the pipeline is really very exciting. It will take quite a bit of dedication to make it all happen, but if even only a small part of it happens, it will still be very good news. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 14:15:55 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FF3F16A468; Wed, 19 Dec 2007 14:15:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id B18A313C4D9; Wed, 19 Dec 2007 14:15:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJEFe3D024477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 01:15:48 +1100 Date: Thu, 20 Dec 2007 01:15:40 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David G Lawrence In-Reply-To: <20071218181023.GW25053@tnn.dglawrence.com> Message-ID: <20071219235444.K928@besplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 14:15:55 -0000 On Tue, 18 Dec 2007, David G Lawrence wrote: >>> I got an almost identical delay (with 64000 vnodes). >>> >>> Now, 17ms isn't much. >> >> Says you. On modern systems, trying to run a pseudo real-time application >> on an otherwise quiescent system, 17ms is just short of an eternity. I agree >> that the syncer should be preemptable (which is what my bandaid patch >> attempts to do), but that probably wouldn't have helped my specific problem >> since my application was a user process, not a kernel thread. FreeBSD isn't a real-time system, and 17ms isn't much for it. I saw lots of syscall delays of nearly 1 second while debugging this. (With another hat, I would say that 17 us was a long time in 1992. 17 us is hundreds of times longer now.) > One more followup (I swear I'm done, really!)... I have a laptop here > that runs at 150MHz when it is in the lowest running CPU power save mode. > At that speed, this bug causes a delay of more than 300ms and is enough > to cause loss of keyboard input. I have to switch into high speed mode > before I try to type anything, else I end up with random typos. Very > annoying. Yes, something is wrong if keystrokes are lost with CPUs that run at 150 kHz (sic) or faster. Debugging shows that the problem is like I said. The loop really does take 125 ns per iteration. This time is actually not very much. The the linked list of vnodes could hardly be designed better to maximize cache thrashing. My system has a fairly small L2 cache (512K or 1M), and even a few words from the vnode and the inode don't fit in the L2 cache when there are 64000 vnodes, but the vp and ip are also fairly well desgined to maximize cache thrashing, so L2 cache thrashing starts at just a few thousand vnodes. My system has fairly low latency main memory, else the problem would be larger: % Memory latencies in nanoseconds - smaller is better % (WARNING - may not be correct, check graphs) % --------------------------------------------------- % Host OS Mhz L1 $ L2 $ Main mem Guesses % --------- ------------- ---- ----- ------ -------- ------- % besplex.b FreeBSD 7.0-C 2205 1.361 5.6090 42.4 [PC3200 CL2.5 overclocked] % sledge.fr FreeBSD 8.0-C 1802 1.666 8.9420 99.8 % freefall. FreeBSD 7.0-C 2778 0.746 6.6310 155.5 The loop makes the following memory accesses, at least in 5.2: % loop: % for (vp = TAILQ_FIRST(&mp->mnt_nvnodelist); vp != NULL; vp = nvp) { % /* % * If the vnode that we are about to sync is no longer % * associated with this mount point, start over. % */ % if (vp->v_mount != mp) % goto loop; % % /* % * Depend on the mntvnode_slock to keep things stable enough % * for a quick test. Since there might be hundreds of % * thousands of vnodes, we cannot afford even a subroutine % * call unless there's a good chance that we have work to do. % */ % nvp = TAILQ_NEXT(vp, v_nmntvnodes); Access 1 word at vp offset 0x90. Costs 1 cache line. IIRC, my system has a cache line size of 0x40. Assume this, and that vp is aligned on a cache line boundary. So this access costs the cache line at vp offsets 0x80-0xbf. % VI_LOCK(vp); Access 1 word at vp offset 0x1c. Costs the cache line at vp offsets 0-0x3f. % if (vp->v_iflag & VI_XLOCK) { Access 1 word at vp offset 0x24. Cache hit. % VI_UNLOCK(vp); % continue; % } % ip = VTOI(vp); Access 1 word at vp offset 0xa8. Cache hit. % if (vp->v_type == VNON || ((ip->i_flag & Access 1 word at vp offset 0xa0. Cache hit. Access 1 word at ip offset 0x18. Assume that ip is aligned, as above. Costs the cache line at ip offsets 0-0x3f. % (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && % TAILQ_EMPTY(&vp->v_dirtyblkhd))) { Access 1 word at vp offset 0x48. Costs the cache line at vp offsets 0x40- 0x7f. % VI_UNLOCK(vp); Reaccess 1 word at vp offset 0x1c. Cache hit. % continue; % } The total cost is 4 cache lines or 256 bytes per vnode. So with an L2 cache size of 1MB, the L2 cache will start thrashing at numvnodes = 4096. With thrashing, an at my main memory latency of 42.4 nsec, it might take 4*42.4 = 169.6 nsec to read main memory. This is similar to my observed time. Presumably things aren't quite that bad because there is some locality for the 3 lines in each vp. It might be possible to improve this a bit by accessing the lines sequentially and not interleaving the access to ip. Better, repack vp and move the IN* flags from ip to vp (a change that has other advantages), so that everything is in 1 cache line per vp. This isn't consistent with the delay increasing to 300 ms when the CPU is throttled -- memory shouldn't be throttled so much. On old machines, the memory was faster relative to the CPU, else noticeable 300[0] ms delays would have been common long ago. I think numvnodes grew large enough to bust L2 caches in the usual case even in 1992. This code clearly wasn't designed with caches in mind :-). The cost of subroutine calls would be in the noise compared with the cost of cache misses. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 14:44:02 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9587116A41B for ; Wed, 19 Dec 2007 14:44:02 +0000 (UTC) (envelope-from rea-fbsd@codelabs.ru) Received: from pobox.codelabs.ru (pobox.codelabs.ru [144.206.177.45]) by mx1.freebsd.org (Postfix) with ESMTP id 5181C13C4CC for ; Wed, 19 Dec 2007 14:44:02 +0000 (UTC) (envelope-from rea-fbsd@codelabs.ru) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=one; d=codelabs.ru; h=Received:Date:From:To:Cc:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:Sender:X-Spam-Status:Subject; b=qEgJc6aBJ4IcIaJjKaYeKN5jULUgOaP29gAKC2vS+tRgne8GKc5cKrck3wHbLqm33yy+SufSlM/7wVh6XgevYNMkrzP6sHlBatro9XYIW3S88EBQ/y2Qv7s/VFxE6n/budZzZ4wU/Tzb07s5O9NxL4xNhG0cVawF6/xTQBoMSfI=; Received: from void.codelabs.ru (void.codelabs.ru [144.206.177.25]) by pobox.codelabs.ru with esmtpsa (TLSv1:AES256-SHA:256) id 1J509P-000PFN-Qj; Wed, 19 Dec 2007 17:44:00 +0300 Date: Wed, 19 Dec 2007 17:43:58 +0300 From: Eygene Ryabinkin To: vermaden Message-ID: <+4G9Nr+ZwtUziff5Dar2/aUcj4w@JA8cQVXg905K+QGregQphbHxLjw> References: <20071218172055.D14CC160055@f32.poczta.interia.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20071218172055.D14CC160055@f32.poczta.interia.pl> Sender: rea-fbsd@codelabs.ru X-Spam-Status: No, score=-2.3 required=4.0 tests=ALL_TRUSTED,AWL,BAYES_20 Cc: freebsd-net@freebsd.org Subject: Re: default route X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 14:44:02 -0000 Tue, Dec 18, 2007 at 06:20:53PM +0100, vermaden wrote: > > After reading this I feel that you have absolutely no packets on > > either interfaces when your Linux box ping FreeBSD. But this > > contradicts with your previous assertion that if ICMP packet comes > > in on rl1, then it is reflected at rl0. Am I missing something? > > Yes I must mislook that, rl0 also is 'dead' while Linux box pings > my FreeBSD box using net on rl1. OK, so I feel that there are two points to check. 1. Firewall. Even if you're running GENERIC, firewall thingies are compiled as kernel modules and can be loaded by the startup scripts. The output of 'kldstat -v' will show what modules are loaded. BPF is run before filtering, so it sees packets that firewall can drop. 2. Enable ICMP verbose mode in the kernel: set the variable 'icmpprintfs' on the top of the /sys/netinet/ip_icmp.c to 1 and define ICMPPRINTFS during kernel compilation via 'makeoptions ICMPPRINTFS=1'. After this you should watch for kernel messages with the 'icmp' at the beginning of the line. Hope this helps. -- Eygene From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 14:54:30 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAFFA16A420; Wed, 19 Dec 2007 14:54:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 5679B13C468; Wed, 19 Dec 2007 14:54:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJEsLqA004385 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 01:54:22 +1100 Date: Thu, 20 Dec 2007 01:54:21 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Fullmer In-Reply-To: Message-ID: <20071220011626.U928@besplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 14:54:30 -0000 On Tue, 18 Dec 2007, Mark Fullmer wrote: > A little progress. > > I have a machine with a KTR enabled kernel running. > > Another machine is running David's ffs_vfsops.c's patch. > > I left two other machines (GENERIC kernels) running the packet loss test > overnight. At ~ 32480 seconds of uptime the problem starts. This is really Try it with "find / -type f >/dev/null" to duplicate the problem almost instantly. > marks are the intervals between test runs. The window of missing packets > (timestamps between two packets where a sequence number is missing) > is usually less than 4us, altough I'm not sure gettimeofday() can be > trusted for measuring this. See https://www.eng.oar.net/~maf/bsd6/p3.png gettimeofday() can normally be trusted to better than 1 us for time differences of up to about 1 second. However, gettimeofday() should not be used in any program written after clock_gettime() became standard in 1994. clock_gettime() has a resolution of 1 ns. It isn't quite that accurate on current machines, but I trust it to measure differences of 10 nsec between back to back clock_gettime() calls here. Sample output from wollman@'s old clock-watching program converted to clock_gettime(): %%% 2007/12/05 (TSC) bde-current, -O2 -mcpu=athlon-xp min 238, max 99730, mean 240.025380, std 77.291436 1th: 239 (1203207 observations) 2th: 240 (556307 observations) 3th: 241 (190211 observations) 4th: 238 (50091 observations) 5th: 242 (20 observations) 2007/11/23 (TSC) bde-current min 247, max 11890, mean 247.857786, std 62.559317 1th: 247 (1274231 observations) 2th: 248 (668611 observations) 3th: 249 (56950 observations) 4th: 250 (23 observations) 5th: 263 (8 observations) 2007/05/19 (TSC) plain -current-noacpi min 262, max 286965, mean 263.941187, std 41.801400 1th: 264 (1343245 observations) 2th: 263 (626226 observations) 3th: 265 (26860 observations) 4th: 262 (3572 observations) 5th: 268 (8 observations) 2007/05/19 (TSC) plain -current-acpi min 261, max 68926, mean 279.848650, std 40.477440 1th: 261 (999391 observations) 2th: 320 (473325 observations) 3th: 262 (373831 observations) 4th: 321 (148126 observations) 5th: 312 (4759 observations) 2007/05/19 (ACPI-fast timecounter) plain -current-acpi min 558, max 285494, mean 827.597038, std 78.322301 1th: 838 (1685662 observations) 2th: 839 (136980 observations) 3th: 559 (72160 observations) 4th: 837 (48902 observations) 5th: 558 (31217 observations) 2007/05/19 (i8254) plain -current-acpi min 3352, max 288288, mean 4182.774148, std 257.977752 1th: 4190 (1423885 observations) 2th: 4191 (440158 observations) 3th: 3352 (65261 observations) 4th: 5028 (39202 observations) 5th: 5029 (15456 observations) %%% "min" here gives the minimum latency of a clock_gettime() syscall. The improvement from 247 nsec to 240 nsec in the "mean" due to -O2 -march-athlon-xp can be trusted to be measured very accurately since it is an average over more than 100 million trials, and the improvement from 247 nsec to 238 nsec for "min" can be trusted because it is consistent with the improvement in the mean. The program had to be converted to use clock_gettime() a few years ago when CPU speeds increased so much that the correct "min" became significantly less than 1. With gettimeofday(), it cannot distinguish between an overhead of 1 ns and an overhead of 1 us. For the ACPI and i8254 timecounter, you can see that the low-level timecounters have a low frequency clock from the large gaps between the observations. There is a gap of 279-280 ns for the acpi timecounter. This is the period of the acpi timecounter's clock (frequency 14318182/4 = period 279.3651 ns. Since we can observe this period to within 1 ns, we must have a basic accuracy of nearly 1 ns, but if we make only 2 observations we are likely to have an inaccuracy of 279 ns due to the granularity of the clock. The TSC has a clock granuarity of 6 ns on my CPU, and delivers almost that much accuracy with only 2 observations, but technical problems prevent general use of the TSC. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 15:19:27 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA03A16A41A; Wed, 19 Dec 2007 15:19:27 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id B5AF113C478; Wed, 19 Dec 2007 15:19:27 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBJFJR29078751; Wed, 19 Dec 2007 07:19:27 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBJFJREq078750; Wed, 19 Dec 2007 07:19:27 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Wed, 19 Dec 2007 07:19:27 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071219151926.GA25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071219235444.K928@besplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Wed, 19 Dec 2007 07:19:27 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 15:19:28 -0000 > On Tue, 18 Dec 2007, David G Lawrence wrote: > > >>>I got an almost identical delay (with 64000 vnodes). > >>> > >>>Now, 17ms isn't much. > >> > >> Says you. On modern systems, trying to run a pseudo real-time > >> application > >>on an otherwise quiescent system, 17ms is just short of an eternity. I > >>agree > >>that the syncer should be preemptable (which is what my bandaid patch > >>attempts to do), but that probably wouldn't have helped my specific > >>problem > >>since my application was a user process, not a kernel thread. > > FreeBSD isn't a real-time system, and 17ms isn't much for it. I saw lots I never said it was, but that doesn't stop us from using FreeBSD in pseudo real-time applications. This is made possible by fast CPUs and dedicated-task systems where the load is carefully controlled. > of syscall delays of nearly 1 second while debugging this. (With another I can make the delay several minutes by pushing the reset button. > Debugging shows that the problem is like I said. The loop really does > take 125 ns per iteration. This time is actually not very much. The Considering that the CPU clock cycle time is on the order of 300ps, I would say 125ns to do a few checks is pathetic. In any case, it appears that my patch is a no-op, at least for the problem I was trying to solve. This has me confused, however, because at one point the problem was mitigated with it. The patch has gone through several iterations, however, and it could be that it was made to the top of the loop, before any of the checks, in a previous version. Hmmm. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 15:48:58 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 382E316A419; Wed, 19 Dec 2007 15:48:58 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 18B8213C46E; Wed, 19 Dec 2007 15:48:57 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBJFmuac097907; Wed, 19 Dec 2007 07:48:56 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBJFmuvg097900; Wed, 19 Dec 2007 07:48:56 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Wed, 19 Dec 2007 07:48:56 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071219154856.GC25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071220011626.U928@besplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Wed, 19 Dec 2007 07:48:56 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 15:48:58 -0000 > Try it with "find / -type f >/dev/null" to duplicate the problem almost > instantly. FreeBSD used to have some code that would cause vnodes with no cached pages to be recycled quickly (which would have made a simple find ineffective without reading the files at least a little bit). I guess that got removed when the size of the vnode pool was dramatically increased. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 16:02:24 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C8B116A418; Wed, 19 Dec 2007 16:02:24 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 79ED213C478; Wed, 19 Dec 2007 16:02:24 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBJG2Ont007111; Wed, 19 Dec 2007 08:02:24 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBJG2OrV007110; Wed, 19 Dec 2007 08:02:24 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Wed, 19 Dec 2007 08:02:24 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071219160224.GD25053@tnn.dglawrence.com> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071219151926.GA25053@tnn.dglawrence.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Wed, 19 Dec 2007 08:02:24 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 16:02:24 -0000 > In any case, it appears that my patch is a no-op, at least for the > problem I was trying to solve. This has me confused, however, because at > one point the problem was mitigated with it. The patch has gone through > several iterations, however, and it could be that it was made to the top > of the loop, before any of the checks, in a previous version. Hmmm. (replying to myself) I just found an earlier version of the patch, and sure enough, it was to the top of the loop. Unfortunately, that version caused the system to crash because vp was occasionally invalid after the wakeup. Anyway, let's see if Mark's packet loss problem is indeed related to this code. If he does the find just after boot and immediately sees the problem, then I would say that is fairly conclusive. He could also release the cached vnodes by temporarily setting kern.maxvnodes=10000 and then setting it back to whatever it was previously (probably 60000-100000). If the problem then goes away for awhile, that would be another good indicator. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 16:05:12 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 110A316A421; Wed, 19 Dec 2007 16:05:12 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 6288913C4F7; Wed, 19 Dec 2007 16:05:11 +0000 (UTC) (envelope-from lastewart@swin.edu.au) Received: from newbox.caia.swin.edu.au (124-168-6-25.dyn.iinet.net.au [124.168.6.25]) (authenticated bits=0) by lauren.room52.net (8.13.8/8.13.8) with ESMTP id lBJFoNSn012477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 02:50:23 +1100 (EST) (envelope-from lastewart@swin.edu.au) Message-ID: <47693DBD.6050104@swin.edu.au> Date: Thu, 20 Dec 2007 02:50:21 +1100 From: Lawrence Stewart User-Agent: Thunderbird 2.0.0.4 (X11/20070625) MIME-Version: 1.0 To: Robert Watson References: <20071219123305.Y95322@fledge.watson.org> In-Reply-To: <20071219123305.Y95322@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00, RCVD_IN_SORBS_DUL,RDNS_DYNAMIC autolearn=disabled version=3.2.3 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on lauren.room52.net Cc: James Healy , arch@freebsd.org, net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 16:05:12 -0000 Hi Robert, Comments inline. Robert Watson wrote: > > Dear all, > > It is rapidly becoming clear that quite a few of us have Big Plans for > the TCP implementation over the next 12-18 months. It's important > that we get the plans out on the table now so that everyone working on > these projects is aware of the larger context. This will encourage > collaboration, but also allow us to manage the risks inevitably > associated with having several simultaneous projects going on in a > very complex software base. With that in mind, here are the large > projects I'm currently aware of: > > Project Flag Wavers Status > ------- ----------- ------ > TCP offload Kip Macy Moving to CVS and under > review and testing; one > supporting device driver. > > TCP congestion control Sam Leffler, At least one prototype > Rui Paulo, implementation, to move to p4 > Andre Oppermann, > Kip Macy, > Lawrence Stewart, > James Healy > > TCP overhaul Andre Oppermann Glimmer in eye, to move to > p4. > > TCP lock granularity/ Robert Watson Glimmer in eye, to occur in > increased parallelism p4. > > TCP timer unification Andre Oppermann, Previously committed, and to > Mike Silbersack be reintroduced via p4. > > Monitoring ABI cleanup Robert Watson Glimmer in eye, to > occur in > p4. > > Looking at the above, it sounds like a massive amount of work taking > place, so we will need to coordinate carefully. I'd like to encourage > people to avoid creating unnecessary dependencies between changes, and > to be especially careful in coordinating potentially MFCable changes. > There are (at least) two conflicting scheduling desires in play here: > > - A desire to merge MFCable changes early, so that they aren't > entangled with > un-mergeable changes. This will simplify merging and also maximize the > extent to which testing in HEAD will apply to them once merged to > RELENG_7. > > - A desire to merge large-scale infrastructural changes early so that > they see > the greatest exposure, and so that they can be introduced > incrementally over > a longer period of time to shake each out. > > Both of these are valid perspectives, and will need to be balanced. I > have a few questions, then, for people involved in these or other > projects: > > (0) Is your project in the above list? If not, could you send out a > reply > talking a bit about the project, who's involved, where it's taking > place, > etc. Rui@ recently posted a TCP ECN patch that probably belongs in the list (http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015979.html) unless it has already recently been committed. Jim and I recently discussed the idea of implementing autotuning of the TCP reassembly queue size based on analysis of some experimental work we've been doing. It's a small project, but we feel it would be worth implementing. Details follow... Problem description: Currently, "net.inet.tcp.reass.maxqlen" specifies the maximum number of segments that can be held in the reassembly queue for a TCP connection. The current default value is 48, which equates to approx. 69k of buffer space if MSS = 1448 bytes. This means that if the TCP window grows to be more than 48 segments wide, and a packet is lost, the receiver will buffer the next 48 segments in the reassembly queue and subsequently drop all the remaining segments in the window because the reassembly buffer is full i.e. 1 packet loss in the network can equate to many packet losses at the receiver because of insufficient buffering. This obviously has a negative impact on performance in environments where there is non-zero packet loss. With the addition of automatic socket buffer tuning in FreeBSD 7, the ability for the TCP window to grow above 48 segments is going to be even more prevalent than it is now, so this issue will continue to affect connections to FreeBSD based TCP receivers. We observed that the socket receive buffer size provides a good indication of the expected number of bytes in flight for a connection, and can therefore serve as the figure to base the size of the reassembly queue on. Basic project description: - Make the reassembly queue's max length a per-connection variable to appropriately tailor the reassembly queue buffer size for each connection - Piggyback automated reassembly queue sizing with the code that resizes the socket receive buffer - The socket buffer tuning code already has the required infrastructure to cap the max buffer size, so this would implicitly limit the size of the reassembly queue - If the socket buffer sizes were explicitly overridden using sockopts (e.g. to support large windows for particular apps), the reassembly queue would grow to accommodate only connections using the larger than normal receive buffer. - The net.inet.tcp.reass.maxsegments tunable would still be left intact to ensure users can set a hard cap on the max amount of memory allowed for reassembly buffering. > > (1) What is your availability to shepherd the project through its entire > cycle, including early prototyping, design review, development, > implementation review, testing, and the inevitable long debugging > tail > that all TCP projects have. We should be able to run the reassembly queue project full cycle. > > (2) When do you think your implementation will reach a prototype phase > appropriate for an expanded circle of reviewers? When do you > think it > might be ready for commit? Keep in mind that we're now a month or > so into > the 18-month cycle for 8.0, and that all serious TCP work should be > completed at least six months before the end of the cycle. To be safe, I'll say we should have a prototype ready by the end of Feb 2008, though I suspect we'll have something ready sooner than that. Commit ready code should follow very shortly after that (few weeks at most), as we anticipate that the patch will be very simple. > > (3) What potential interactions of note exist between your project and > the > others being planned. Are there explicit dependencies? The "TCP Overhaul" project would possibly alter the location of the changes, but shouldn't affect the essence of the changes themselves. It's unlikely any of the other projects would affect this one. > > (4) Do you anticipate an MFC cycle for your work to RELENG_7? Yes. A munged version could also be made available for RELENG_6.... it just wouldn't be based on automatic receive buffer tuning, and would probably be based on a static calculation during connection initialisation. > > I'd like for us to create a wiki page tracking these various projects, > and pointing at per-project resources. Once the discussion has > settled a bit, I can take responsibility for creating such a page, but > will need everyone involved to help maintain it, as well as to > maintain pages (on the wiki or elsewhere) regarding the status of the > projects. I think it also makes a lot of sense for participants in > the projects to send occasional updates and reports to net@/arch@ in > order to keep people who can't track things day-to-date in the loop, > and to invite review. Sounds fair. [snip] Cheers, Jim and Lawrence From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 16:31:10 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D519016A418 for ; Wed, 19 Dec 2007 16:31:10 +0000 (UTC) (envelope-from ups@freebsd.org) Received: from smtpauth02.prod.mesa1.secureserver.net (smtpauth02.prod.mesa1.secureserver.net [64.202.165.182]) by mx1.freebsd.org (Postfix) with SMTP id A24D013C459 for ; Wed, 19 Dec 2007 16:31:09 +0000 (UTC) (envelope-from ups@freebsd.org) Received: (qmail 32630 invoked from network); 19 Dec 2007 16:04:29 -0000 Received: from unknown (66.23.216.53) by smtpauth02.prod.mesa1.secureserver.net (64.202.165.182) with ESMTP; 19 Dec 2007 16:04:28 -0000 Message-ID: <4769410E.3050005@freebsd.org> Date: Wed, 19 Dec 2007 11:04:30 -0500 From: Stephan Uphoff User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: David G Lawrence References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <20071219154856.GC25053@tnn.dglawrence.com> In-Reply-To: <20071219154856.GC25053@tnn.dglawrence.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 16:31:10 -0000 David G Lawrence wrote: >> Try it with "find / -type f >/dev/null" to duplicate the problem almost >> instantly. >> > > FreeBSD used to have some code that would cause vnodes with no cached > pages to be recycled quickly (which would have made a simple find > ineffective without reading the files at least a little bit). I guess > that got removed when the size of the vnode pool was dramatically > increased. > You can decrease vfs.wantfreevnodes if caching files without cached data is not beneficial for your application. > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 16:49:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D589B16A419; Wed, 19 Dec 2007 16:49:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 77EE913C465; Wed, 19 Dec 2007 16:49:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJGnk3A028545 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 03:49:47 +1100 Date: Thu, 20 Dec 2007 03:49:45 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071219151926.GA25053@tnn.dglawrence.com> Message-ID: <20071220032223.V38101@delplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 16:49:52 -0000 On Wed, 19 Dec 2007, David G Lawrence wrote: >> Debugging shows that the problem is like I said. The loop really does >> take 125 ns per iteration. This time is actually not very much. The > > Considering that the CPU clock cycle time is on the order of 300ps, I > would say 125ns to do a few checks is pathetic. As I said, 125 nsec is a short time in this context. It is approximately the time for a single L2 cache miss on a machine with slow memory like freefall (Xeon 2.8 GHz with L2 cache latency of 155.5 ns). As I said, the code is organized so as to give about 4 L2 cache misses per vnode if there are more than a few thousand vnodes, so it is doing very well to take only 125 nsec for a few checks. > In any case, it appears that my patch is a no-op, at least for the > problem I was trying to solve. This has me confused, however, because at > one point the problem was mitigated with it. The patch has gone through > several iterations, however, and it could be that it was made to the top > of the loop, before any of the checks, in a previous version. Hmmm. The patch should work fine. IIRC, it yields voluntarily so that other things can run. I committed a similar hack for uiomove(). It was easy to make syscalls that take many seconds (now tenths of seconds insted of seconds?), and without yielding or PREEMPTION or multiple CPUs, everything except interrupts has to wait for these syscalls. Now the main problem is to figure out why PREEMPTION doesn't work. I'm not working on this directly since I'm running ~5.2 where nearly-full kernel preemption doesn't work due to Giant locking. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 17:00:04 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA9CA16A47A; Wed, 19 Dec 2007 17:00:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 657E113C4EF; Wed, 19 Dec 2007 17:00:03 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJGxts6007170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 03:59:57 +1100 Date: Thu, 20 Dec 2007 03:59:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071219154856.GC25053@tnn.dglawrence.com> Message-ID: <20071220035129.R38221@delplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <20071219154856.GC25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 17:00:05 -0000 On Wed, 19 Dec 2007, David G Lawrence wrote: >> Try it with "find / -type f >/dev/null" to duplicate the problem almost >> instantly. > > FreeBSD used to have some code that would cause vnodes with no cached > pages to be recycled quickly (which would have made a simple find > ineffective without reading the files at least a little bit). I guess > that got removed when the size of the vnode pool was dramatically > increased. It might still. The data should be cached somewhere, but caching it in both the buffer cache/VMIO and the vnode/inode is wasteful. I may have been only caching vnodes for directories. I switched to using a find or a tar on /home/ncvs/ports since that has a very high density of directories. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 17:04:34 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D62B716A417; Wed, 19 Dec 2007 17:04:34 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id B4B1213C447; Wed, 19 Dec 2007 17:04:34 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBJH4Yt0046644; Wed, 19 Dec 2007 09:04:34 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBJH4YCu046643; Wed, 19 Dec 2007 09:04:34 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Wed, 19 Dec 2007 09:04:34 -0800 From: David G Lawrence To: Bruce Evans Message-ID: <20071219170434.GG25053@tnn.dglawrence.com> References: <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071220032223.V38101@delplex.bde.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Wed, 19 Dec 2007 09:04:34 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 17:04:34 -0000 > > In any case, it appears that my patch is a no-op, at least for the > >problem I was trying to solve. This has me confused, however, because at > >one point the problem was mitigated with it. The patch has gone through > >several iterations, however, and it could be that it was made to the top > >of the loop, before any of the checks, in a previous version. Hmmm. > > The patch should work fine. IIRC, it yields voluntarily so that other > things can run. I committed a similar hack for uiomove(). It was It patches the bottom of the loop, which is only reached if the vnode is dirty. So it will only help if there are thousands of dirty vnodes. While that condition can certainly happen, it isn't the case that I'm particularly interested in. > CPUs, everything except interrupts has to wait for these syscalls. Now > the main problem is to figure out why PREEMPTION doesn't work. I'm > not working on this directly since I'm running ~5.2 where nearly-full > kernel preemption doesn't work due to Giant locking. I don't understand how PREEMPTION is supposed to work (I mean to any significant detail), so I can't really comment on that. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 17:07:18 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2CA816A469 for ; Wed, 19 Dec 2007 17:07:18 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 1E94A13C46B for ; Wed, 19 Dec 2007 17:07:18 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 43387 invoked from network); 19 Dec 2007 17:07:17 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 19 Dec 2007 17:07:17 -0000 In-Reply-To: <20071220011626.U928@besplex.bde.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Wed, 19 Dec 2007 12:06:59 -0500 To: Bruce Evans X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 17:07:18 -0000 On Dec 19, 2007, at 9:54 AM, Bruce Evans wrote: > On Tue, 18 Dec 2007, Mark Fullmer wrote: > >> A little progress. >> >> I have a machine with a KTR enabled kernel running. >> >> Another machine is running David's ffs_vfsops.c's patch. >> >> I left two other machines (GENERIC kernels) running the packet >> loss test >> overnight. At ~ 32480 seconds of uptime the problem starts. This >> is really > > Try it with "find / -type f >/dev/null" to duplicate the problem > almost > instantly. I was able to verify last night that (cd /; tar -cpf -) > all.tar would trigger the problem. I'm working getting a test running with David's ffs_sync() workaround now, adding a few counters there should get this narrowed down a little more. Thanks for the other info on timer resolution, I overlooked clock_gettime(). -- mark From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 17:13:32 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0FE716A419; Wed, 19 Dec 2007 17:13:32 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id A121B13C458; Wed, 19 Dec 2007 17:13:32 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBJHDVgN052479; Wed, 19 Dec 2007 09:13:31 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBJHDVFL052478; Wed, 19 Dec 2007 09:13:31 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Wed, 19 Dec 2007 09:13:31 -0800 From: David G Lawrence To: Mark Fullmer Message-ID: <20071219171331.GH25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Wed, 19 Dec 2007 09:13:31 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 17:13:32 -0000 > >Try it with "find / -type f >/dev/null" to duplicate the problem > >almost > >instantly. > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > trigger the problem. I'm working getting a test running with > David's ffs_sync() workaround now, adding a few counters there should > get this narrowed down a little more. Unfortunately, the version of the patch that I sent out isn't going to help your problem. It needs to yield at the top of the loop, but vp isn't necessarily valid after the wakeup from the msleep. That's a problem that I'm having trouble figuring out a solution to - the solutions that come to mind will all significantly increase the overhead of the loop. As a very inadequate work-around, you might consider lowering kern.maxvnodes to something like 20000 - that might be low enough to not trigger the problem, but also be high enough to not significantly affect system I/O performance. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 18:09:34 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C720B16A420; Wed, 19 Dec 2007 18:09:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id 6986413C448; Wed, 19 Dec 2007 18:09:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJI9Ni5005222 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 05:09:25 +1100 Date: Thu, 20 Dec 2007 05:09:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20071220032223.V38101@delplex.bde.org> Message-ID: <20071220044515.K4939@besplex.bde.org> References: <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 18:09:34 -0000 On Thu, 20 Dec 2007, Bruce Evans wrote: > On Wed, 19 Dec 2007, David G Lawrence wrote: >> Considering that the CPU clock cycle time is on the order of 300ps, I >> would say 125ns to do a few checks is pathetic. > > As I said, 125 nsec is a short time in this context. It is approximately > the time for a single L2 cache miss on a machine with slow memory like > freefall (Xeon 2.8 GHz with L2 cache latency of 155.5 ns). As I said, Perfmon counts for the cache misses during sync(1); ==> /tmp/kg1/z0 <== vfs.numvnodes: 630 # s/kx-dc-accesses 484516 # s/kx-dc-misses 20852 misses = 4% ==> /tmp/kg1/z1 <== vfs.numvnodes: 9246 # s/kx-dc-accesses 884361 # s/kx-dc-misses 89833 misses = 10% ==> /tmp/kg1/z2 <== vfs.numvnodes: 20312 # s/kx-dc-accesses 1389959 # s/kx-dc-misses 178207 misses = 13% ==> /tmp/kg1/z3 <== vfs.numvnodes: 80802 # s/kx-dc-accesses 4122411 # s/kx-dc-misses 658740 misses = 16% ==> /tmp/kg1/z4 <== vfs.numvnodes: 138557 # s/kx-dc-accesses 7150726 # s/kx-dc-misses 1129997 misses = 16% === I forgot to only count active vnodes in the above. vfs.freevnodes was small (< 5%). I set kern.maxvnodes to 200000, but vfs.numvnodes saturated at 138557 (probably all that fits in kvm or main memory on i386 with 1GB RAM). With 138557 vnodes, a null sync(2) takes 39673 us according to kdump -R. That is 35.1 ns per miss. This is consistent with lmbench2's estimate of 42.5 ns for main memory latency. Watching vfs.*vnodes confirmed that vnode caching still works like you said: o "find /home/ncvs/ports -type f" only gives a vnode for each directory o a repeated "find /home/ncvs/ports -type f" is fast because everything remains cached by VMIO. FreeBSD performed very badly at this benchmark before VMIO existed and was used for directories o "tar cf /dev/zero /home/ncvs/ports" gives a vnode for files too. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 18:12:12 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FF6816A46E; Wed, 19 Dec 2007 18:12:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 0E94713C458; Wed, 19 Dec 2007 18:12:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J53Ol-000621-AH; Wed, 19 Dec 2007 20:12:10 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBJIBxCx002569; Wed, 19 Dec 2007 20:11:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBJIBxik002568; Wed, 19 Dec 2007 20:11:59 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Dec 2007 20:11:59 +0200 From: Kostik Belousov To: David G Lawrence Message-ID: <20071219181158.GC57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="QRj9sO5tAVLaXnSD" Content-Disposition: inline In-Reply-To: <20071219171331.GH25053@tnn.dglawrence.com> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: eb35aeac2b84d7677a861f668fbe3b20 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1929 [Dec 19 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 18:12:12 -0000 --QRj9sO5tAVLaXnSD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: > > >Try it with "find / -type f >/dev/null" to duplicate the problem =20 > > >almost > > >instantly. > >=20 > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > > trigger the problem. I'm working getting a test running with > > David's ffs_sync() workaround now, adding a few counters there should > > get this narrowed down a little more. >=20 > Unfortunately, the version of the patch that I sent out isn't going to > help your problem. It needs to yield at the top of the loop, but vp isn't > necessarily valid after the wakeup from the msleep. That's a problem that > I'm having trouble figuring out a solution to - the solutions that come > to mind will all significantly increase the overhead of the loop. > As a very inadequate work-around, you might consider lowering > kern.maxvnodes to something like 20000 - that might be low enough to > not trigger the problem, but also be high enough to not significantly > affect system I/O performance. I think the following may be safe. It counts only the clean scanned vnodes and does not evaluate the vp, that indeed may be reclaimed, after the sleep. I never booted with the change. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index cbccc62..e686b97 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -1176,6 +1176,7 @@ ffs_sync(mp, waitfor, td) struct ufsmount *ump =3D VFSTOUFS(mp); struct fs *fs; int error, count, wait, lockreq, allerror =3D 0; + int yield_count; int suspend; int suspended; int secondary_writes; @@ -1216,6 +1217,7 @@ loop: softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); =20 + yield_count =3D 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough @@ -1233,6 +1235,11 @@ loop: (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) =3D=3D 0 && vp->v_bufobj.bo_dirty.bv_cnt =3D=3D 0)) { VI_UNLOCK(vp); + if (yield_count++ =3D=3D 500) { + yield_count =3D 0; + msleep(&yield_count, MNT_MTX(mp), PZERO, + "ffspause", 1); + } continue; } MNT_IUNLOCK(mp); --QRj9sO5tAVLaXnSD Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHaV7uC3+MBN1Mb4gRAqLNAJ471VG5oznpot2N3bfli+CxXeDDlQCfb3r5 c/Pmx/oykpmlmw9bqog0ci4= =Z4qH -----END PGP SIGNATURE----- --QRj9sO5tAVLaXnSD-- From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 18:24:54 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F62D16A417; Wed, 19 Dec 2007 18:24:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id E2AC713C45D; Wed, 19 Dec 2007 18:24:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J53bA-0008x5-5b; Wed, 19 Dec 2007 20:24:52 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBJIOn83006616; Wed, 19 Dec 2007 20:24:49 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBJIOmJ3006615; Wed, 19 Dec 2007 20:24:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Dec 2007 20:24:48 +0200 From: Kostik Belousov To: David G Lawrence Message-ID: <20071219182448.GD57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071219181158.GC57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="wULyF7TL5taEdwHz" Content-Disposition: inline In-Reply-To: <20071219181158.GC57756@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: c743d2f542030c7d5376417eb66b5b31 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1929 [Dec 19 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 18:24:54 -0000 --wULyF7TL5taEdwHz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 19, 2007 at 08:11:59PM +0200, Kostik Belousov wrote: > On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: > > > >Try it with "find / -type f >/dev/null" to duplicate the problem =20 > > > >almost > > > >instantly. > > >=20 > > > I was able to verify last night that (cd /; tar -cpf -) > all.tar wou= ld > > > trigger the problem. I'm working getting a test running with > > > David's ffs_sync() workaround now, adding a few counters there should > > > get this narrowed down a little more. > >=20 > > Unfortunately, the version of the patch that I sent out isn't going = to > > help your problem. It needs to yield at the top of the loop, but vp isn= 't > > necessarily valid after the wakeup from the msleep. That's a problem th= at > > I'm having trouble figuring out a solution to - the solutions that come > > to mind will all significantly increase the overhead of the loop. > > As a very inadequate work-around, you might consider lowering > > kern.maxvnodes to something like 20000 - that might be low enough to > > not trigger the problem, but also be high enough to not significantly > > affect system I/O performance. >=20 > I think the following may be safe. It counts only the clean scanned vnodes > and does not evaluate the vp, that indeed may be reclaimed, after the sle= ep. >=20 > I never booted with the change. >=20 > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > index cbccc62..e686b97 100644 Or, better to use uio_yield(). See below. diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index cbccc62..5d2535f 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -1176,6 +1176,7 @@ ffs_sync(mp, waitfor, td) struct ufsmount *ump =3D VFSTOUFS(mp); struct fs *fs; int error, count, wait, lockreq, allerror =3D 0; + int yield_count; int suspend; int suspended; int secondary_writes; @@ -1216,6 +1217,7 @@ loop: softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); =20 + yield_count =3D 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough @@ -1233,6 +1235,12 @@ loop: (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) =3D=3D 0 && vp->v_bufobj.bo_dirty.bv_cnt =3D=3D 0)) { VI_UNLOCK(vp); + if (yield_count++ =3D=3D 500) { + MNT_IUNLOCK(mp); + yield_count =3D 0; + uio_yield(); + goto relock_mp; + } continue; } MNT_IUNLOCK(mp); @@ -1247,6 +1255,7 @@ loop: if ((error =3D ffs_syncvnode(vp, waitfor)) !=3D 0) allerror =3D error; vput(vp); + relock_mp: MNT_ILOCK(mp); } MNT_IUNLOCK(mp); --wULyF7TL5taEdwHz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHaWHwC3+MBN1Mb4gRArC6AJ4rYZhWlamxL8uvszTZp2sVfNACkQCgqugO 4roWpidQRMN1XzFyhqB/2f0= =e7xk -----END PGP SIGNATURE----- --wULyF7TL5taEdwHz-- From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 18:33:52 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BBE916A420 for ; Wed, 19 Dec 2007 18:33:52 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 2209213C478 for ; Wed, 19 Dec 2007 18:33:51 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 60029 invoked from network); 19 Dec 2007 18:33:51 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 19 Dec 2007 18:33:51 -0000 In-Reply-To: <20071219171331.GH25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <50B64D0B-35E6-453F-A8AF-65982A503E20@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Wed, 19 Dec 2007 13:33:34 -0500 To: David G Lawrence X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 18:33:52 -0000 Just to confirm the patch did not change the behavior. I ran with it last night and double checked this morning to make sure. It looks like if you put the check at the top of the loop and the next node is changed during msleep() SLIST_NEXT will walk into the trash. I'm in over my head here.... Setting kern.maxvnodes=1000 does stop both the periodic packet loss and the high latency syscall's, so it does look like walking this chain without yielding the processor is part of the problem I'm seeing. The other behavior I don't understand is why the em driver is able to increment if_ipackets but still lose the packet. Dumping the internal stats with dev.em.1.stats=1: Dec 19 13:07:46 dytnq-nf1 kernel: em1: Excessive collisions = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Sequence errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Defer count = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Missed Packets = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive No Buffers = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive Length Errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Crc errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Alignment errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Collision/Carrier extension errors = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: RX overruns = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: watchdog timeouts = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Rcvd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Xmtd = 0 Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Failed = 0 With FreeBSD 4 I was able to run a UDP data collector with rtprio set, kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF in the application. If packets were dropped they would show up with netstat -s as "dropped due to full socket buffers". Since the packet never makes it to ip_input() I no longer have any way to count drops. There will always be corner cases where interrupts are lost and drops not accounted for if the adapter hardware can't report them, but right now I've got no way to estimate any loss. -- mark On Dec 19, 2007, at 12:13 PM, David G Lawrence wrote: >>> Try it with "find / -type f >/dev/null" to duplicate the problem >>> almost >>> instantly. >> >> I was able to verify last night that (cd /; tar -cpf -) > all.tar >> would >> trigger the problem. I'm working getting a test running with >> David's ffs_sync() workaround now, adding a few counters there should >> get this narrowed down a little more. > > Unfortunately, the version of the patch that I sent out isn't > going to > help your problem. It needs to yield at the top of the loop, but vp > isn't > necessarily valid after the wakeup from the msleep. That's a > problem that > I'm having trouble figuring out a solution to - the solutions that > come > to mind will all significantly increase the overhead of the loop. > As a very inadequate work-around, you might consider lowering > kern.maxvnodes to something like 20000 - that might be low enough to > not trigger the problem, but also be high enough to not significantly > affect system I/O performance. > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 19:19:37 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C40DB16A469 for ; Wed, 19 Dec 2007 19:19:37 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outI.internet-mail-service.net (outI.internet-mail-service.net [216.240.47.232]) by mx1.freebsd.org (Postfix) with ESMTP id B080D13C4D5 for ; Wed, 19 Dec 2007 19:19:37 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 19 Dec 2007 11:19:36 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 70EA8126D0A; Wed, 19 Dec 2007 11:19:36 -0800 (PST) Message-ID: <47696EC8.50808@elischer.org> Date: Wed, 19 Dec 2007 11:19:36 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Maxime Henrion References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> <20071219120831.GN71713@elvis.mu.org> In-Reply-To: <20071219120831.GN71713@elvis.mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:19:37 -0000 Maxime Henrion wrote: > > It appears that this patch fixed the problem. My gateway server > now has a nearly two days uptime, whereas previously it would have > probably crashed already. I'm attaching the final version of the > patch here, since the last one had build-time errors. I'm going > to commit this in HEAD soon unless someone has an objection for it. I haven't looked at the patch in place yet.. are we absolutly sure we arr not 'leaking' references to the rt? (I'll check myself when I get a chance in a short while..) > > Cheers, > Maxime > From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 19:41:12 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 736E616A417; Wed, 19 Dec 2007 19:41:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 167C713C4DD; Wed, 19 Dec 2007 19:41:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBJJf3qs021871 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 06:41:08 +1100 Date: Thu, 20 Dec 2007 06:41:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: David G Lawrence In-Reply-To: <20071219170434.GG25053@tnn.dglawrence.com> Message-ID: <20071220051751.E38491@delplex.bde.org> References: <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> <20071219170434.GG25053@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:41:12 -0000 On Wed, 19 Dec 2007, David G Lawrence wrote: >> The patch should work fine. IIRC, it yields voluntarily so that other >> things can run. I committed a similar hack for uiomove(). It was > > It patches the bottom of the loop, which is only reached if the vnode > is dirty. So it will only help if there are thousands of dirty vnodes. > While that condition can certainly happen, it isn't the case that I'm > particularly interested in. Oops. When it reaches the bottom of the loop, it will probably block on i/o sometimes, so that the problem is smaller anyway. >> CPUs, everything except interrupts has to wait for these syscalls. Now >> the main problem is to figure out why PREEMPTION doesn't work. I'm >> not working on this directly since I'm running ~5.2 where nearly-full >> kernel preemption doesn't work due to Giant locking. > > I don't understand how PREEMPTION is supposed to work (I mean > to any significant detail), so I can't really comment on that. Me neither, but I will comment anyway :-). I think PREEMPTION should even preempt kernel threads in favor of (higher priority of course) user threads that are in the kernel, but doesn't do this now. Even interrupt threads should have dynamic priorities so that when they become too hoggish they can be preempted even by user threads subject to the this priority rule. This is further from happening. ffs_sync() can hold the mountpoint lock for a long time. That gives problems preempting it. To move your fix to the top of the loop, I think you just need to drop the mountpoint lock every few hundred iterations while yielding. This would help for PREEMPTION too. Dropping the lock must be safe because it is already done while flushing. Hmm, the loop is nicely obfuscated and pessimized in current (see rev.1.234). The fast (modulo no cache misses) path used to be just a TAILQ_NEXT() to reach the next vnode, but now unnecessarily joins the slow path at MNT_VNODE_FOREACH(), and MNT_VNODE_FOREACH() hides a function call. Bruce From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 19:44:01 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 974E716A418 for ; Wed, 19 Dec 2007 19:44:01 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outC.internet-mail-service.net (outC.internet-mail-service.net [216.240.47.226]) by mx1.freebsd.org (Postfix) with ESMTP id 7ECA613C448 for ; Wed, 19 Dec 2007 19:44:01 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 19 Dec 2007 11:44:00 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 26311126CF7; Wed, 19 Dec 2007 11:44:00 -0800 (PST) Message-ID: <47697480.9070208@elischer.org> Date: Wed, 19 Dec 2007 11:44:00 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: David G Lawrence References: <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org> <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> <20071219170434.GG25053@tnn.dglawrence.com> In-Reply-To: <20071219170434.GG25053@tnn.dglawrence.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:44:01 -0000 David G Lawrence wrote: >>> In any case, it appears that my patch is a no-op, at least for the >>> problem I was trying to solve. This has me confused, however, because at >>> one point the problem was mitigated with it. The patch has gone through >>> several iterations, however, and it could be that it was made to the top >>> of the loop, before any of the checks, in a previous version. Hmmm. >> The patch should work fine. IIRC, it yields voluntarily so that other >> things can run. I committed a similar hack for uiomove(). It was > > It patches the bottom of the loop, which is only reached if the vnode > is dirty. So it will only help if there are thousands of dirty vnodes. > While that condition can certainly happen, it isn't the case that I'm > particularly interested in. > >> CPUs, everything except interrupts has to wait for these syscalls. Now >> the main problem is to figure out why PREEMPTION doesn't work. I'm >> not working on this directly since I'm running ~5.2 where nearly-full >> kernel preemption doesn't work due to Giant locking. > > I don't understand how PREEMPTION is supposed to work (I mean > to any significant detail), so I can't really comment on that. It's really very simple. When you do a "wakeup" (or anything else that puts a thread on a run queue) i.e. use setrunqueue() then if that thread has more priority than you do, (and in the general case is an interrupt thread), you immedialty call mi_switch so that it runs imediatly. You get guaranteed to run again when it finishes. (you are not just put back on the run queue at the end). the critical_enter()/critical_exit() calls disable this from happening to you if you really must not be interrupted by another thread. there is an option where it is not jsut interrupt threads that can jump in, but I think it's usually disabled. > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 19:53:07 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E8D616A417 for ; Wed, 19 Dec 2007 19:53:07 +0000 (UTC) (envelope-from Stephen.Clark@seclark.us) Received: from smtpout06.prod.mesa1.secureserver.net (smtpout06-04.prod.mesa1.secureserver.net [64.202.165.227]) by mx1.freebsd.org (Postfix) with SMTP id 07ECE13C457 for ; Wed, 19 Dec 2007 19:53:06 +0000 (UTC) (envelope-from Stephen.Clark@seclark.us) Received: (qmail 30157 invoked from network); 19 Dec 2007 19:53:04 -0000 Received: from unknown (24.144.77.185) by smtpout06-04.prod.mesa1.secureserver.net (64.202.165.227) with ESMTP; 19 Dec 2007 19:53:02 -0000 Message-ID: <4769769C.9070802@seclark.us> Date: Wed, 19 Dec 2007 14:53:00 -0500 From: Stephen Clark User-Agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> <20071219120831.GN71713@elvis.mu.org> <47696EC8.50808@elischer.org> In-Reply-To: <47696EC8.50808@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Stephen.Clark@seclark.us List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:53:07 -0000 Julian Elischer wrote: > Maxime Henrion wrote: > >> >> It appears that this patch fixed the problem. My gateway server >> now has a nearly two days uptime, whereas previously it would have >> probably crashed already. I'm attaching the final version of the >> patch here, since the last one had build-time errors. I'm going >> to commit this in HEAD soon unless someone has an objection for it. > > > I haven't looked at the patch in place yet.. > > are we absolutly sure we arr not 'leaking' references to the rt? > > (I'll check myself when I get a chance in a short while..) > >> >> Cheers, >> Maxime >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > Hello, Would this also be a problem that could occur using quagga with zebra and ospfd? Thanks, Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson) From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 19:57:27 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84DF616A41B for ; Wed, 19 Dec 2007 19:57:27 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outI.internet-mail-service.net (outI.internet-mail-service.net [216.240.47.232]) by mx1.freebsd.org (Postfix) with ESMTP id 749EF13C46A for ; Wed, 19 Dec 2007 19:57:27 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 19 Dec 2007 11:57:26 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id C72C3126CF7; Wed, 19 Dec 2007 11:57:25 -0800 (PST) Message-ID: <476977A5.8080004@elischer.org> Date: Wed, 19 Dec 2007 11:57:25 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Stephen.Clark@seclark.us References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> <20071219120831.GN71713@elvis.mu.org> <47696EC8.50808@elischer.org> <4769769C.9070802@seclark.us> In-Reply-To: <4769769C.9070802@seclark.us> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 19:57:27 -0000 Stephen Clark wrote: > Julian Elischer wrote: > >> Maxime Henrion wrote: >> >>> >>> It appears that this patch fixed the problem. My gateway server >>> now has a nearly two days uptime, whereas previously it would have >>> probably crashed already. I'm attaching the final version of the >>> patch here, since the last one had build-time errors. I'm going >>> to commit this in HEAD soon unless someone has an objection for it. >> >> >> I haven't looked at the patch in place yet.. >> >> are we absolutly sure we arr not 'leaking' references to the rt? >> >> (I'll check myself when I get a chance in a short while..) >> >>> >>> Cheers, >>> Maxime >>> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > Hello, > > Would this also be a problem that could occur using quagga with zebra > and ospfd? Sure, if they use routing sockets. > > Thanks, > Steve > From owner-freebsd-net@FreeBSD.ORG Wed Dec 19 20:47:22 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E35616A41B; Wed, 19 Dec 2007 20:47:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id C241613C458; Wed, 19 Dec 2007 20:47:21 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J55oz-000MMU-8c; Wed, 19 Dec 2007 22:47:20 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBJKlDd2009333; Wed, 19 Dec 2007 22:47:13 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBJKlCAJ009332; Wed, 19 Dec 2007 22:47:12 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Dec 2007 22:47:12 +0200 From: Kostik Belousov To: Julian Elischer Message-ID: <20071219204712.GE57756@deviant.kiev.zoral.com.ua> References: <20071218233644.U756@besplex.bde.org> <20071218141742.GS25053@tnn.dglawrence.com> <20071219022102.I34422@delplex.bde.org> <20071218165732.GV25053@tnn.dglawrence.com> <20071218181023.GW25053@tnn.dglawrence.com> <20071219235444.K928@besplex.bde.org> <20071219151926.GA25053@tnn.dglawrence.com> <20071220032223.V38101@delplex.bde.org> <20071219170434.GG25053@tnn.dglawrence.com> <47697480.9070208@elischer.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ffoCPvUAPMgSXi6H" Content-Disposition: inline In-Reply-To: <47697480.9070208@elischer.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 3f6c5f504711b4b2e39067d20b60c9d6 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1930 [Dec 19 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2007 20:47:22 -0000 --ffoCPvUAPMgSXi6H Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 19, 2007 at 11:44:00AM -0800, Julian Elischer wrote: > David G Lawrence wrote: > >>> In any case, it appears that my patch is a no-op, at least for the > >>>problem I was trying to solve. This has me confused, however, because = at > >>>one point the problem was mitigated with it. The patch has gone through > >>>several iterations, however, and it could be that it was made to the t= op > >>>of the loop, before any of the checks, in a previous version. Hmmm. > >>The patch should work fine. IIRC, it yields voluntarily so that other > >>things can run. I committed a similar hack for uiomove(). It was > > > > It patches the bottom of the loop, which is only reached if the vnode > >is dirty. So it will only help if there are thousands of dirty vnodes. > >While that condition can certainly happen, it isn't the case that I'm > >particularly interested in. > > > >>CPUs, everything except interrupts has to wait for these syscalls. Now > >>the main problem is to figure out why PREEMPTION doesn't work. I'm > >>not working on this directly since I'm running ~5.2 where nearly-full > >>kernel preemption doesn't work due to Giant locking. > > > > I don't understand how PREEMPTION is supposed to work (I mean > >to any significant detail), so I can't really comment on that. >=20 > It's really very simple. >=20 > When you do a "wakeup"=20 > (or anything else that puts a thread on a run queue) > i.e. use setrunqueue() > then if that thread has more priority than you do, (and in the general ca= se > is an interrupt thread), you immedialty call mi_switch so that it runs=20 > imediatly. > You get guaranteed to run again when it finishes.=20 > (you are not just put back on the run queue at the end). As far as I see it, only the interrupt threads can put the kernel thread off the CPU. More, the thread being forced out shall be an "idle user thread". See kern_switch.c, maybe_preempt(), the #ifndef FULL_PREEMPTION block. >=20 > the critical_enter()/critical_exit() calls disable this from happening to= =20 > you if you really must not be interrupted by another thread. >=20 > there is an option where it is not jsut interrupt threads that can jump i= n, > but I think it's usually disabled. Do you mean FULL_PREEMPTION ? --ffoCPvUAPMgSXi6H Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHaYNPC3+MBN1Mb4gRAs3VAKCAuYRei7c6tM7PCglA0MhS1wv9YgCg2qQD rtEKhVPITTealtAh8v2AClM= =as0Z -----END PGP SIGNATURE----- --ffoCPvUAPMgSXi6H-- From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 07:33:10 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88C2816A419; Thu, 20 Dec 2007 07:33:10 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6838013C45A; Thu, 20 Dec 2007 07:33:10 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (remko@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBK7XAqY009024; Thu, 20 Dec 2007 07:33:10 GMT (envelope-from remko@freefall.freebsd.org) Received: (from remko@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBK7XA78009020; Thu, 20 Dec 2007 07:33:10 GMT (envelope-from remko) Date: Thu, 20 Dec 2007 07:33:10 GMT Message-Id: <200712200733.lBK7XA78009020@freefall.freebsd.org> To: remko@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: remko@FreeBSD.org Cc: Subject: Re: kern/118879: [bge] [patch] bge has checksum problems on the 5703 chipset X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 07:33:10 -0000 Synopsis: [bge] [patch] bge has checksum problems on the 5703 chipset Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: remko Responsible-Changed-When: Thu Dec 20 07:32:58 UTC 2007 Responsible-Changed-Why: This seems like something for -net http://www.freebsd.org/cgi/query-pr.cgi?pr=118879 From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 08:10:36 2007 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB34516A418; Thu, 20 Dec 2007 08:10:36 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 89E0D13C442; Thu, 20 Dec 2007 08:10:36 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id lBK8Aakb030443; Thu, 20 Dec 2007 08:10:36 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id lBK8AaaM030439; Thu, 20 Dec 2007 08:10:36 GMT (envelope-from linimon) Date: Thu, 20 Dec 2007 08:10:36 GMT Message-Id: <200712200810.lBK8AaaM030439@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/118880: [ipv6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented for IPv6 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 08:10:36 -0000 Synopsis: [ipv6] IP_RECVDSTADDR & IP_SENDSRCADDR not implemented for IPv6 Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Thu Dec 20 08:10:27 UTC 2007 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=118880 From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 08:36:02 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81D1316A41A; Thu, 20 Dec 2007 08:36:02 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 1A8D913C44B; Thu, 20 Dec 2007 08:36:01 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBK8Zuap004999 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Dec 2007 19:35:57 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.1/8.14.1) with ESMTP id lBK8ZuxA064436; Thu, 20 Dec 2007 19:35:56 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.1/8.14.1/Submit) id lBK8Zt5O064435; Thu, 20 Dec 2007 19:35:55 +1100 (EST) (envelope-from peter) Date: Thu, 20 Dec 2007 19:35:55 +1100 From: Peter Jeremy To: Mark Fullmer Message-ID: <20071220083555.GO79196@server.vk2pj.dyndns.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="gKMricLos+KVdGMg" Content-Disposition: inline In-Reply-To: <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 08:36:02 -0000 --gKMricLos+KVdGMg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 19, 2007 at 12:06:59PM -0500, Mark Fullmer wrote: >Thanks for the other info on timer resolution, I overlooked >clock_gettime(). If you have a UP system with a usable TSC (or equivalent) then using rdtsc() (or equivalent) is a much cheaper way to measure short durations with high resolution. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --gKMricLos+KVdGMg Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHailr/opHv/APuIcRAqRQAJ4x4iTL9WiMWXy6VHcl9DLyMRWLEACdHJXI I3yP2HkQ+YAf+Ka8s/qSEoM= =6ncl -----END PGP SIGNATURE----- --gKMricLos+KVdGMg-- From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 10:53:30 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3433B16A417 for ; Thu, 20 Dec 2007 10:53:30 +0000 (UTC) (envelope-from randy@psg.com) Received: from rip.psg.com (rip.psg.com [147.28.0.39]) by mx1.freebsd.org (Postfix) with ESMTP id 1760D13C458 for ; Thu, 20 Dec 2007 10:53:29 +0000 (UTC) (envelope-from randy@psg.com) Received: from 50.216.138.210.bn.2iij.net ([210.138.216.50] helo=[192.168.0.10]) by rip.psg.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68 (FreeBSD)) (envelope-from ) id 1J5J1t-000K9A-0H; Thu, 20 Dec 2007 10:53:29 +0000 Message-ID: <476A4998.1010808@psg.com> Date: Thu, 20 Dec 2007 19:53:12 +0900 From: Randy Bush User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: FreeBSD Net X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: userland ppp depends on ldconfig X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 10:53:30 -0000 i have a dedicated ppp link that has to come up at boot. in /etc/rc.conf, i have # User ppp configuration. ppp_enable=YES ppp_mode=dedicated ppp_nat=YES ppp_profile=frob during boot, i was getting /libexec/ld-elf.so.1: Shared object "libintl.so.8" not found, required by "su" and ppp was not starting. this was fixed by --- /etc/rc.d/ppp~ 2007-12-17 21:41:59.000000000 +0000 +++ /etc/rc.d/ppp 2007-12-24 05:14:14.000000000 +0000 @@ -4,7 +4,7 @@ # # PROVIDE: ppp -# REQUIRE: netif isdnd +# REQUIRE: netif isdnd ldconfig # KEYWORD: nojail . /etc/rc.subr was there a more proper fix i should have used? randy From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 10:58:37 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC46F16A41A; Thu, 20 Dec 2007 10:58:37 +0000 (UTC) (envelope-from sem@FreeBSD.org) Received: from mail.ciam.ru (ns.ciam.ru [213.247.195.75]) by mx1.freebsd.org (Postfix) with ESMTP id A984B13C458; Thu, 20 Dec 2007 10:58:37 +0000 (UTC) (envelope-from sem@FreeBSD.org) Received: from dhcp250-210.yandex.ru ([87.250.250.210]) by mail.ciam.ru with esmtpa (Exim 4.x) id 1J5IW2-000Dye-T2; Thu, 20 Dec 2007 13:20:34 +0300 Message-ID: <476A4154.6000304@FreeBSD.org> Date: Thu, 20 Dec 2007 13:17:56 +0300 From: Sergey Matveychuk User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Julian Elischer References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> <20071219120831.GN71713@elvis.mu.org> <47696EC8.50808@elischer.org> <4769769C.9070802@seclark.us> <476977A5.8080004@elischer.org> In-Reply-To: <476977A5.8080004@elischer.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , Stephen.Clark@seclark.us, Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 10:58:38 -0000 Julian Elischer wrote: >> >> Would this also be a problem that could occur using quagga with zebra >> and ospfd? > > Sure, if they use routing sockets. > Of course, they are do. -- Dixi. Sem. From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 11:03:43 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E9FC16A418 for ; Thu, 20 Dec 2007 11:03:43 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id B0EA213C45D for ; Thu, 20 Dec 2007 11:03:42 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 30324 invoked from network); 20 Dec 2007 10:04:48 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 20 Dec 2007 10:04:48 -0000 Message-ID: <476A45D6.6030305@freebsd.org> Date: Thu, 20 Dec 2007 11:37:10 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.13 (Windows/20070809) MIME-Version: 1.0 To: Lawrence Stewart References: <20071219123305.Y95322@fledge.watson.org> <47693DBD.6050104@swin.edu.au> In-Reply-To: <47693DBD.6050104@swin.edu.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: James Healy , arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 11:03:43 -0000 Lawrence Stewart wrote: > Hi Robert, > > Comments inline. > > Robert Watson wrote: >> >> Dear all, >> >> It is rapidly becoming clear that quite a few of us have Big Plans for >> the TCP implementation over the next 12-18 months. It's important >> that we get the plans out on the table now so that everyone working on >> these projects is aware of the larger context. This will encourage >> collaboration, but also allow us to manage the risks inevitably >> associated with having several simultaneous projects going on in a >> very complex software base. With that in mind, here are the large >> projects I'm currently aware of: >> >> Project Flag Wavers Status >> ------- ----------- ------ >> TCP offload Kip Macy Moving to CVS and under >> review and testing; one >> supporting device driver. >> >> TCP congestion control Sam Leffler, At least one prototype >> Rui Paulo, implementation, to move to p4 >> Andre Oppermann, >> Kip Macy, >> Lawrence Stewart, >> James Healy >> >> TCP overhaul Andre Oppermann Glimmer in eye, to move to >> p4. >> >> TCP lock granularity/ Robert Watson Glimmer in eye, to occur in >> increased parallelism p4. >> >> TCP timer unification Andre Oppermann, Previously committed, and to >> Mike Silbersack be reintroduced via p4. >> >> Monitoring ABI cleanup Robert Watson Glimmer in eye, to >> occur in >> p4. >> >> Looking at the above, it sounds like a massive amount of work taking >> place, so we will need to coordinate carefully. I'd like to encourage >> people to avoid creating unnecessary dependencies between changes, and >> to be especially careful in coordinating potentially MFCable changes. >> There are (at least) two conflicting scheduling desires in play here: >> >> - A desire to merge MFCable changes early, so that they aren't >> entangled with >> un-mergeable changes. This will simplify merging and also maximize the >> extent to which testing in HEAD will apply to them once merged to >> RELENG_7. >> >> - A desire to merge large-scale infrastructural changes early so that >> they see >> the greatest exposure, and so that they can be introduced >> incrementally over >> a longer period of time to shake each out. >> >> Both of these are valid perspectives, and will need to be balanced. I >> have a few questions, then, for people involved in these or other >> projects: >> >> (0) Is your project in the above list? If not, could you send out a >> reply >> talking a bit about the project, who's involved, where it's taking >> place, >> etc. > > Rui@ recently posted a TCP ECN patch that probably belongs in the list > (http://lists.freebsd.org/pipermail/freebsd-net/2007-November/015979.html) > unless it has already recently been committed. > > > Jim and I recently discussed the idea of implementing autotuning of the > TCP reassembly queue size based on analysis of some experimental work > we've been doing. It's a small project, but we feel it would be worth > implementing. Details follow... > > > Problem description: > > Currently, "net.inet.tcp.reass.maxqlen" specifies the maximum number of > segments that can be held in the reassembly queue for a TCP connection. > The current default value is 48, which equates to approx. 69k of buffer > space if MSS = 1448 bytes. This means that if the TCP window grows to be > more than 48 segments wide, and a packet is lost, the receiver will > buffer the next 48 segments in the reassembly queue and subsequently > drop all the remaining segments in the window because the reassembly > buffer is full i.e. 1 packet loss in the network can equate to many > packet losses at the receiver because of insufficient buffering. This > obviously has a negative impact on performance in environments where > there is non-zero packet loss. > > With the addition of automatic socket buffer tuning in FreeBSD 7, the > ability for the TCP window to grow above 48 segments is going to be even > more prevalent than it is now, so this issue will continue to affect > connections to FreeBSD based TCP receivers. > > We observed that the socket receive buffer size provides a good > indication of the expected number of bytes in flight for a connection, > and can therefore serve as the figure to base the size of the reassembly > queue on. I've got a rewritten and much more efficient tcp_reass() function in my local tree. I'll import it into Perforce next week with all the other stuff. You may want to base your auto-sizing work on it. The only missing parts are some statistics gathering. -- Andre > Basic project description: > > - Make the reassembly queue's max length a per-connection variable to > appropriately tailor the reassembly queue buffer size for each connection > > - Piggyback automated reassembly queue sizing with the code that resizes > the socket receive buffer > > - The socket buffer tuning code already has the required infrastructure > to cap the max buffer size, so this would implicitly limit the size of > the reassembly queue > > - If the socket buffer sizes were explicitly overridden using sockopts > (e.g. to support large windows for particular apps), the reassembly > queue would grow to accommodate only connections using the larger than > normal receive buffer. > > - The net.inet.tcp.reass.maxsegments tunable would still be left intact > to ensure users can set a hard cap on the max amount of memory allowed > for reassembly buffering. > >> >> (1) What is your availability to shepherd the project through its entire >> cycle, including early prototyping, design review, development, >> implementation review, testing, and the inevitable long debugging >> tail >> that all TCP projects have. > > We should be able to run the reassembly queue project full cycle. > >> >> (2) When do you think your implementation will reach a prototype phase >> appropriate for an expanded circle of reviewers? When do you >> think it >> might be ready for commit? Keep in mind that we're now a month or >> so into >> the 18-month cycle for 8.0, and that all serious TCP work should be >> completed at least six months before the end of the cycle. > > To be safe, I'll say we should have a prototype ready by the end of Feb > 2008, though I suspect we'll have something ready sooner than that. > Commit ready code should follow very shortly after that (few weeks at > most), as we anticipate that the patch will be very simple. > >> >> (3) What potential interactions of note exist between your project and >> the >> others being planned. Are there explicit dependencies? > > The "TCP Overhaul" project would possibly alter the location of the > changes, but shouldn't affect the essence of the changes themselves. > It's unlikely any of the other projects would affect this one. > >> >> (4) Do you anticipate an MFC cycle for your work to RELENG_7? > > Yes. A munged version could also be made available for RELENG_6.... it > just wouldn't be based on automatic receive buffer tuning, and would > probably be based on a static calculation during connection initialisation. > >> >> I'd like for us to create a wiki page tracking these various projects, >> and pointing at per-project resources. Once the discussion has >> settled a bit, I can take responsibility for creating such a page, but >> will need everyone involved to help maintain it, as well as to >> maintain pages (on the wiki or elsewhere) regarding the status of the >> projects. I think it also makes a lot of sense for participants in >> the projects to send occasional updates and reports to net@/arch@ in >> order to keep people who can't track things day-to-date in the loop, >> and to invite review. > > Sounds fair. > > [snip] > > Cheers, > Jim and Lawrence > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 13:56:06 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D947B16A420; Thu, 20 Dec 2007 13:56:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 9662713C474; Thu, 20 Dec 2007 13:56:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2AF3946F7A; Thu, 20 Dec 2007 08:56:06 -0500 (EST) Date: Thu, 20 Dec 2007 13:56:06 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: net@FreeBSD.org Message-ID: <20071220135342.O67327@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 13:56:06 -0000 Per earlier e-mail, I've created a page to track the various on-going projects: http://wiki.freebsd.org/TCPProjects8 Rui has already kindly added the TCP ECN work to the page. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 15:25:34 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5730A16A417; Thu, 20 Dec 2007 15:25:33 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id C824913C459; Thu, 20 Dec 2007 15:25:32 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id lBKFPVk6053771; Thu, 20 Dec 2007 09:25:31 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id lBKFPVd7053770; Thu, 20 Dec 2007 09:25:31 -0600 (CST) (envelope-from brooks) Date: Thu, 20 Dec 2007 09:25:31 -0600 From: Brooks Davis To: Robert Watson Message-ID: <20071220152531.GA53327@lor.one-eyed-alien.net> References: <20071219123305.Y95322@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline In-Reply-To: <20071219123305.Y95322@fledge.watson.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Thu, 20 Dec 2007 09:25:31 -0600 (CST) Cc: James Healy , arch@freebsd.org, Lawrence Stewart , net@freebsd.org Subject: Re: Coordinating TCP projects X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 15:25:34 -0000 --LQksG6bCIzRHxTLp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 19, 2007 at 01:09:13PM +0000, Robert Watson wrote: >=20 > I'd like for us to create a wiki page tracking these various projects, an= d=20 > pointing at per-project resources. Once the discussion has settled a bit= ,=20 > I can take responsibility for creating such a page, but will need everyon= e=20 > involved to help maintain it, as well as to maintain pages (on the wiki o= r=20 > elsewhere) regarding the status of the projects. I think it also makes a= =20 > lot of sense for participants in the projects to send occasional updates= =20 > and reports to net@/arch@ in order to keep people who can't track things= =20 > day-to-date in the loop, and to invite review. In addition to the wiki, I think it's important to emphasize that as a matter of principle, if it's not public, then in practice it doesn't exist. This means that if you want people to coordinate their changes with your changes, you need to be working in public. You should ideally be working in perforce, but at a minimum regular posts discussing details are required. Non-public work (existent or not) will not be permitted to delay the inclusion of desirable features that are reviewed and tested. -- Brooks --LQksG6bCIzRHxTLp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHaolqXY6L6fI4GtQRAgVSAJ9+Voftec9WDAzSCv9/q+tq192JqwCeMAnN 8jlDdU08+rOjrPz01GcBwWA= =hwMu -----END PGP SIGNATURE----- --LQksG6bCIzRHxTLp-- From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 18:01:32 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB98916A46B for ; Thu, 20 Dec 2007 18:01:32 +0000 (UTC) (envelope-from chuckr@chuckr.org) Received: from mail6.sea5.speakeasy.net (mail6.sea5.speakeasy.net [69.17.117.8]) by mx1.freebsd.org (Postfix) with ESMTP id AFBCD13C469 for ; Thu, 20 Dec 2007 18:01:32 +0000 (UTC) (envelope-from chuckr@chuckr.org) Received: (qmail 20746 invoked from network); 20 Dec 2007 17:34:51 -0000 Received: from april.chuckr.org (chuckr@[66.92.151.30]) (envelope-sender ) by mail6.sea5.speakeasy.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 20 Dec 2007 17:34:51 -0000 Message-ID: <476AA708.1090908@chuckr.org> Date: Thu, 20 Dec 2007 12:31:52 -0500 From: Chuck Robey User-Agent: Thunderbird 2.0.0.6 (X11/20071107) MIME-Version: 1.0 To: Robert Watson References: <20071220135342.O67327@fledge.watson.org> In-Reply-To: <20071220135342.O67327@fledge.watson.org> X-Enigmail-Version: 0.95.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 18:01:33 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robert Watson wrote: > > Per earlier e-mail, I've created a page to track the various on-going > projects: > > http://wiki.freebsd.org/TCPProjects8 > > Rui has already kindly added the TCP ECN work to the page. > Things like this should definitely be publicized on the FreeBSD main web page, because it's stuff like this I find exciting. > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHaqcIz62J6PPcoOkRApN0AJ9zo/3MzQnDs2FtIuKYyje6L5u9VgCfTRpX aFBUQLFM2Dbx9U5P2Jv1Az8= =FSIC -----END PGP SIGNATURE----- From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 18:12:53 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E30816A46B for ; Thu, 20 Dec 2007 18:12:53 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outJ.internet-mail-service.net (outJ.internet-mail-service.net [216.240.47.233]) by mx1.freebsd.org (Postfix) with ESMTP id 4982D13C47E for ; Thu, 20 Dec 2007 18:12:53 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 20 Dec 2007 10:12:52 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 92D3A126DB3; Thu, 20 Dec 2007 10:12:51 -0800 (PST) Message-ID: <476AB0A2.8070501@elischer.org> Date: Thu, 20 Dec 2007 10:12:50 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Sergey Matveychuk References: <20071213133817.GC71713@elvis.mu.org> <47617AF5.7070701@elischer.org> <20071214092539.GB14339@glebius.int.ru> <4762DD82.9070904@elischer.org> <20071217101009.GL71713@elvis.mu.org> <20071219120831.GN71713@elvis.mu.org> <47696EC8.50808@elischer.org> <4769769C.9070802@seclark.us> <476977A5.8080004@elischer.org> <476A4154.6000304@FreeBSD.org> In-Reply-To: <476A4154.6000304@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , Stephen.Clark@seclark.us, Gleb Smirnoff , net@FreeBSD.org Subject: Re: Deadlock in the routing code X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 18:12:53 -0000 Sergey Matveychuk wrote: > Julian Elischer wrote: >>> >>> Would this also be a problem that could occur using quagga with zebra >>> and ospfd? >> >> Sure, if they use routing sockets. >> > > Of course, they are do. then the answer is yes :-) > From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 18:38:06 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E490B16A46E for ; Thu, 20 Dec 2007 18:38:06 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outJ.internet-mail-service.net (outJ.internet-mail-service.net [216.240.47.233]) by mx1.freebsd.org (Postfix) with ESMTP id C238A13C457 for ; Thu, 20 Dec 2007 18:38:06 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 20 Dec 2007 10:38:06 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 882B2126D39; Thu, 20 Dec 2007 10:38:05 -0800 (PST) Message-ID: <476AB68C.30201@elischer.org> Date: Thu, 20 Dec 2007 10:38:04 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: dima <_pppp@mail.ru> References: <20071220135342.O67327@fledge.watson.org> In-Reply-To: Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, Robert Watson , net@FreeBSD.org Subject: Re: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 18:38:07 -0000 dima wrote: >> Per earlier e-mail, I've created a page to track the various on-going >> projects: >> >> http://wiki.freebsd.org/TCPProjects8 >> >> Rui has already kindly added the TCP ECN work to the page. > > As I know, we have a single swi:net thread in the kernel yet. Are there any plans to make several such threads? If yes, this activity isn't mentioned in wiki. > There are 2 ideas: > 1. per-core thread > 2. per-interface thread and for my system with 64 virtual interfaces? > I like the second more. > > Regards, > Dmitriy Marchenko. > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 20:45:53 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD3D016A41A for ; Thu, 20 Dec 2007 20:45:53 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 59C5B13C455 for ; Thu, 20 Dec 2007 20:45:53 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 79588 invoked from network); 20 Dec 2007 20:45:52 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 20 Dec 2007 20:45:52 -0000 In-Reply-To: <20071219181158.GC57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071219181158.GC57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <1C1F9DB7-1B79-4718-9A27-379D1E6F0F10@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Thu, 20 Dec 2007 15:45:35 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 20:45:53 -0000 Thanks, I'll test this later on today. On Dec 19, 2007, at 1:11 PM, Kostik Belousov wrote: > On Wed, Dec 19, 2007 at 09:13:31AM -0800, David G Lawrence wrote: >>>> Try it with "find / -type f >/dev/null" to duplicate the problem >>>> almost >>>> instantly. >>> >>> I was able to verify last night that (cd /; tar -cpf -) > all.tar >>> would >>> trigger the problem. I'm working getting a test running with >>> David's ffs_sync() workaround now, adding a few counters there >>> should >>> get this narrowed down a little more. >> >> Unfortunately, the version of the patch that I sent out isn't >> going to >> help your problem. It needs to yield at the top of the loop, but >> vp isn't >> necessarily valid after the wakeup from the msleep. That's a >> problem that >> I'm having trouble figuring out a solution to - the solutions that >> come >> to mind will all significantly increase the overhead of the loop. >> As a very inadequate work-around, you might consider lowering >> kern.maxvnodes to something like 20000 - that might be low enough to >> not trigger the problem, but also be high enough to not significantly >> affect system I/O performance. > > I think the following may be safe. It counts only the clean scanned > vnodes > and does not evaluate the vp, that indeed may be reclaimed, after > the sleep. > > I never booted with the change. > > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > index cbccc62..e686b97 100644 > --- a/sys/ufs/ffs/ffs_vfsops.c > +++ b/sys/ufs/ffs/ffs_vfsops.c > @@ -1176,6 +1176,7 @@ ffs_sync(mp, waitfor, td) > struct ufsmount *ump = VFSTOUFS(mp); > struct fs *fs; > int error, count, wait, lockreq, allerror = 0; > + int yield_count; > int suspend; > int suspended; > int secondary_writes; > @@ -1216,6 +1217,7 @@ loop: > softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); > MNT_ILOCK(mp); > > + yield_count = 0; > MNT_VNODE_FOREACH(vp, mp, mvp) { > /* > * Depend on the mntvnode_slock to keep things stable enough > @@ -1233,6 +1235,11 @@ loop: > (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && > vp->v_bufobj.bo_dirty.bv_cnt == 0)) { > VI_UNLOCK(vp); > + if (yield_count++ == 500) { > + yield_count = 0; > + msleep(&yield_count, MNT_MTX(mp), PZERO, > + "ffspause", 1); > + } > continue; > } > MNT_IUNLOCK(mp); From owner-freebsd-net@FreeBSD.ORG Thu Dec 20 21:45:39 2007 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9DF516A419; Thu, 20 Dec 2007 21:45:39 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from mx28.mail.ru (mx28.mail.ru [194.67.23.67]) by mx1.freebsd.org (Postfix) with ESMTP id 8D3B213C44B; Thu, 20 Dec 2007 21:45:39 +0000 (UTC) (envelope-from vadim_nuclight@mail.ru) Received: from mx40.mail.ru (mx40.mail.ru [194.67.23.36]) by mx28.mail.ru (mPOP.Fallback_MX) with ESMTP id DD96C7889BD; Thu, 20 Dec 2007 21:49:36 +0300 (MSK) Received: from [78.140.3.25] (port=18063 helo=nuclight.avtf.net) by mx40.mail.ru with esmtp id 1J5QSb-000HS6-00; Thu, 20 Dec 2007 21:49:33 +0300 To: "Julian Elischer" , dima <_pppp@mail.ru> References: <20071220135342.O67327@fledge.watson.org> <476AB68C.30201@elischer.org> Message-ID: Date: Fri, 21 Dec 2007 00:49:29 +0600 From: "Vadim Goncharov" Organization: AVTF TPU Hostel Content-Type: text/plain; format=flowed; delsp=yes; charset=koi8-r MIME-Version: 1.0 Content-Transfer-Encoding: 8bit In-Reply-To: <476AB68C.30201@elischer.org> User-Agent: Opera M2/7.54 (Win32, build 3865) Cc: arch@freebsd.org, Robert Watson , net@freebsd.org Subject: Re: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 21:45:40 -0000 21.12.07 @ 00:38 Julian Elischer wrote: >> As I know, we have a single swi:net thread in the kernel yet. Are >> there any plans to make several such threads? If yes, this activity >> isn't mentioned in wiki. >> There are 2 ideas: >> 1. per-core thread >> 2. per-interface thread >> I like the second more. > > and for my system with 64 virtual interfaces? Surely, per-core thread is enough - why have too much synchronization overhead?.. A computer is a state machine. Threads are for people who can't program state machines. (c) Alan Cox -- WBR, Vadim Goncharov From owner-freebsd-net@FreeBSD.ORG Fri Dec 21 00:55:46 2007 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55DFD16A417; Fri, 21 Dec 2007 00:55:46 +0000 (UTC) (envelope-from _pppp@mail.ru) Received: from mx28.mail.ru (mx28.mail.ru [194.67.23.67]) by mx1.freebsd.org (Postfix) with ESMTP id 141EC13C442; Fri, 21 Dec 2007 00:55:46 +0000 (UTC) (envelope-from _pppp@mail.ru) Received: from f59.mail.ru (f59.mail.ru [194.67.57.93]) by mx28.mail.ru (mPOP.Fallback_MX) with ESMTP id 43D8E770F38; Thu, 20 Dec 2007 21:17:55 +0300 (MSK) Received: from mail by f59.mail.ru with local id 1J5Pxt-0004O8-00; Thu, 20 Dec 2007 21:17:49 +0300 Received: from [89.208.20.114] by koi.mail.ru with HTTP; Thu, 20 Dec 2007 21:17:49 +0300 From: dima <_pppp@mail.ru> To: Robert Watson Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [89.208.20.114] Date: Thu, 20 Dec 2007 21:17:49 +0300 In-Reply-To: <20071220135342.O67327@fledge.watson.org> References: <20071220135342.O67327@fledge.watson.org> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: Cc: arch@FreeBSD.org, net@FreeBSD.org Subject: Re: TCP Projects for 8.0 - first cut wiki page X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dima <_pppp@mail.ru> List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2007 00:55:46 -0000 > Per earlier e-mail, I've created a page to track the various on-going > projects: > > http://wiki.freebsd.org/TCPProjects8 > > Rui has already kindly added the TCP ECN work to the page. As I know, we have a single swi:net thread in the kernel yet. Are there any plans to make several such threads? If yes, this activity isn't mentioned in wiki. There are 2 ideas: 1. per-core thread 2. per-interface thread I like the second more. Regards, Dmitriy Marchenko. From owner-freebsd-net@FreeBSD.ORG Fri Dec 21 11:31:22 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B415716A474 for ; Fri, 21 Dec 2007 11:31:22 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: from smtp4.poczta.interia.pl (smtp35.poczta.interia.pl [80.48.65.35]) by mx1.freebsd.org (Postfix) with ESMTP id 6BC1213C4DD for ; Fri, 21 Dec 2007 11:31:22 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: by smtp4.poczta.interia.pl (INTERIA.PL, from userid 502) id 620BB2848AE; Fri, 21 Dec 2007 12:31:20 +0100 (CET) Received: from f04.poczta.interia.pl (f04.poczta.interia.pl [10.217.2.4]) by smtp4.poczta.interia.pl (INTERIA.PL) with ESMTP id B98A3284814; Fri, 21 Dec 2007 12:31:19 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by f04.poczta.interia.pl (Postfix) with ESMTP id AC6813C2BA; Fri, 21 Dec 2007 12:31:18 +0100 (CET) Date: 21 Dec 2007 12:31:18 +0100 From: vermaden To: Eygene Ryabinkin MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=ISO-8859-2 X-ORIGINATE-IP: 217.76.112.72 X-Mailer: PSE Message-Id: <20071221113118.AC6813C2BA@f04.poczta.interia.pl> X-EMID: adf40acc Cc: freebsd-net@freebsd.org Subject: Re: default route X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2007 11:31:22 -0000 > Tue, Dec 18, 2007 at 06:20:53PM +0100, vermaden wrote: > > > After reading this I feel that you have absolutely no packets on > > > either interfaces when your Linux box ping FreeBSD. But this > > > contradicts with your previous assertion that if ICMP packet comes > > > in on rl1, then it is reflected at rl0. Am I missing something? > > > > Yes I must mislook that, rl0 also is 'dead' while Linux box pings > > my FreeBSD box using net on rl1. > > OK, so I feel that there are two points to check. > > 1. Firewall. Even if you're running GENERIC, firewall thingies > are compiled as kernel modules and can be loaded by the startup > scripts. The output of 'kldstat -v' will show what modules > are loaded. BPF is run before filtering, so it sees packets > that firewall can drop. > > 2. Enable ICMP verbose mode in the kernel: set the variable > 'icmpprintfs' on the top of the /sys/netinet/ip_icmp.c > to 1 and define ICMPPRINTFS during kernel compilation via > 'makeoptions ICMPPRINTFS=1'. After this you should watch for > kernel messages with the 'icmp' at the beginning of the line. > > Hope this helps. > -- > Eygene First of all thanks for still trying to solve my problem. Ad 1. Firewall is not enabled/loaded, no firewall in kernel or as a module. Ad 2. Thanks for that option, I will try this after 26.12 (after christmas) I think and I will post the results here. Regards vermaden ---------------------------------------------------------------------- Tysiace smiesznych filmikow z sieci. Sprawdz >> http://link.interia.pl/f1ca7 From owner-freebsd-net@FreeBSD.ORG Fri Dec 21 20:09:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B20BB16A417; Fri, 21 Dec 2007 20:09:38 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 8989F13C461; Fri, 21 Dec 2007 20:09:38 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id CD0981A4D7C; Fri, 21 Dec 2007 12:08:10 -0800 (PST) Date: Fri, 21 Dec 2007 12:08:10 -0800 From: Alfred Perlstein To: David G Lawrence Message-ID: <20071221200810.GY16982@elvis.mu.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071219171331.GH25053@tnn.dglawrence.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2007 20:09:38 -0000 * David G Lawrence [071219 09:12] wrote: > > >Try it with "find / -type f >/dev/null" to duplicate the problem > > >almost > > >instantly. > > > > I was able to verify last night that (cd /; tar -cpf -) > all.tar would > > trigger the problem. I'm working getting a test running with > > David's ffs_sync() workaround now, adding a few counters there should > > get this narrowed down a little more. > > Unfortunately, the version of the patch that I sent out isn't going to > help your problem. It needs to yield at the top of the loop, but vp isn't > necessarily valid after the wakeup from the msleep. That's a problem that > I'm having trouble figuring out a solution to - the solutions that come > to mind will all significantly increase the overhead of the loop. > As a very inadequate work-around, you might consider lowering > kern.maxvnodes to something like 20000 - that might be low enough to > not trigger the problem, but also be high enough to not significantly > affect system I/O performance. I apologize for not reading the code as I am swamped, but a technique that Matt Dillon used for bufs might work here. Can you use a placeholder vnode as a place to restart the scan? you might have to mark it special so that other threads/things (getnewvnode()?) don't molest it, but it can provide for a convenient restart point. -- - Alfred Perlstein From owner-freebsd-net@FreeBSD.ORG Fri Dec 21 23:43:48 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCF5A16A418; Fri, 21 Dec 2007 23:43:48 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 8EBA413C4FB; Fri, 21 Dec 2007 23:43:48 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBLNhlEH044230; Fri, 21 Dec 2007 15:43:47 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBLNhlZD044229; Fri, 21 Dec 2007 15:43:47 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 15:43:47 -0800 From: David G Lawrence To: Alfred Perlstein Message-ID: <20071221234347.GS25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071221200810.GY16982@elvis.mu.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 15:43:47 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2007 23:43:49 -0000 > > Unfortunately, the version of the patch that I sent out isn't going to > > help your problem. It needs to yield at the top of the loop, but vp isn't > > necessarily valid after the wakeup from the msleep. That's a problem that > > I'm having trouble figuring out a solution to - the solutions that come > > to mind will all significantly increase the overhead of the loop. > > I apologize for not reading the code as I am swamped, but a technique > that Matt Dillon used for bufs might work here. > > Can you use a placeholder vnode as a place to restart the scan? > you might have to mark it special so that other threads/things > (getnewvnode()?) don't molest it, but it can provide for a convenient > restart point. That was one of the solutions that I considered and rejected since it would significantly increase the overhead of the loop. The solution provided by Kostik Belousov that uses uio_yield looks like a find solution. I intend to try it out on some servers RSN. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 00:26:00 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA67B16A417; Sat, 22 Dec 2007 00:26:00 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 91BA213C45D; Sat, 22 Dec 2007 00:26:00 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 1AE051A4D7C; Fri, 21 Dec 2007 16:24:32 -0800 (PST) Date: Fri, 21 Dec 2007 16:24:32 -0800 From: Alfred Perlstein To: David G Lawrence Message-ID: <20071222002432.GK16982@elvis.mu.org> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071221234347.GS25053@tnn.dglawrence.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 00:26:00 -0000 * David G Lawrence [071221 15:42] wrote: > > > Unfortunately, the version of the patch that I sent out isn't going to > > > help your problem. It needs to yield at the top of the loop, but vp isn't > > > necessarily valid after the wakeup from the msleep. That's a problem that > > > I'm having trouble figuring out a solution to - the solutions that come > > > to mind will all significantly increase the overhead of the loop. > > > > I apologize for not reading the code as I am swamped, but a technique > > that Matt Dillon used for bufs might work here. > > > > Can you use a placeholder vnode as a place to restart the scan? > > you might have to mark it special so that other threads/things > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > restart point. > > That was one of the solutions that I considered and rejected since it > would significantly increase the overhead of the loop. > The solution provided by Kostik Belousov that uses uio_yield looks like > a find solution. I intend to try it out on some servers RSN. Out of curiosity's sake, why would it make the loop slower? one would only add the placeholder when yielding, not for every iteration. -- - Alfred Perlstein From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 01:54:14 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAD9116A41B; Sat, 22 Dec 2007 01:54:14 +0000 (UTC) (envelope-from davids@webmaster.com) Received: from mail1.webmaster.com (mail1.webmaster.com [216.152.64.169]) by mx1.freebsd.org (Postfix) with ESMTP id 9CE9B13C44B; Sat, 22 Dec 2007 01:54:14 +0000 (UTC) (envelope-from davids@webmaster.com) Received: from however by webmaster.com (MDaemon.PRO.v8.1.3.R) with ESMTP id md50001820320.msg; Fri, 21 Dec 2007 17:44:30 -0800 From: "David Schwartz" To: "Freebsd-Net@Freebsd. Org" Date: Fri, 21 Dec 2007 17:43:09 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 In-Reply-To: <20071221234347.GS25053@tnn.dglawrence.com> X-Authenticated-Sender: joelkatz@webmaster.com X-Spam-Processed: mail1.webmaster.com, Fri, 21 Dec 2007 17:44:30 -0800 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 206.171.168.138 X-Return-Path: davids@webmaster.com X-MDAV-Processed: mail1.webmaster.com, Fri, 21 Dec 2007 17:44:32 -0800 Cc: freebsd-stable@freebsd.org Subject: RE: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davids@webmaster.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 01:54:14 -0000 I'm just an observer, and I may be confused, but it seems to me that this is motion in the wrong direction (at least, it's not going to fix the actual problem). As I understand the problem, once you reach a certain point, the system slows down *every* 30.999 seconds. Now, it's possible for the code to cause one slowdown as it cleans up, but why does it need to clean up so much 31 seconds later? Why not find/fix the actual bug? Then work on getting the yield right if it turns out there's an actual problem for it to fix. If the problem is that too much work is being done at a stretch and it turns out this is because work is being done erroneously or needlessly, fixing that should solve the whole problem. Doing the work that doesn't need to be done more slowly is at best an ugly workaround. Or am I misunderstanding? DS From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 03:57:50 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31F1C16A417 for ; Sat, 22 Dec 2007 03:57:50 +0000 (UTC) (envelope-from maf@splintered.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id C47D013C448 for ; Sat, 22 Dec 2007 03:57:49 +0000 (UTC) (envelope-from maf@splintered.net) Received: (qmail 59559 invoked from network); 22 Dec 2007 03:31:09 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 22 Dec 2007 03:31:09 -0000 In-Reply-To: <20071221234347.GS25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Fri, 21 Dec 2007 22:30:51 -0500 To: David G Lawrence X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, Alfred Perlstein , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 03:57:50 -0000 The uio_yield() idea did not work. Still have the same 31 second interval packet loss. Is it safe to assume the vp will be valid after a msleep() or uio_yield()? If so can we do something a little different: Currently: /* this takes too long when list is large */ MNT_VNODE_FOREACH(vp, mp, mvp) { do work } Why not do this incrementally and call ffs_sync() more often, or break it out into ffs_isync() (incremental sync). static struct vnode *vp; /* first? */ if (!vp) vp = __mnt_vnode_first(&mvp, mp); for (vcount = 0; vp && (vcount != 500); ++vcount) { do work vp = __mnt_vnode_next(&mvp, mp); } The problem I see with this is a race condition where this list may change between the incremental calls. -- mark On Dec 21, 2007, at 6:43 PM, David G Lawrence wrote: >>> Unfortunately, the version of the patch that I sent out isn't >>> going to >>> help your problem. It needs to yield at the top of the loop, but >>> vp isn't >>> necessarily valid after the wakeup from the msleep. That's a >>> problem that >>> I'm having trouble figuring out a solution to - the solutions >>> that come >>> to mind will all significantly increase the overhead of the loop. >> >> I apologize for not reading the code as I am swamped, but a technique >> that Matt Dillon used for bufs might work here. >> >> Can you use a placeholder vnode as a place to restart the scan? >> you might have to mark it special so that other threads/things >> (getnewvnode()?) don't molest it, but it can provide for a convenient >> restart point. > > That was one of the solutions that I considered and rejected > since it > would significantly increase the overhead of the loop. > The solution provided by Kostik Belousov that uses uio_yield > looks like > a find solution. I intend to try it out on some servers RSN. > > -DG > > David G. Lawrence > President > Download Technologies, Inc. - http://www.downloadtech.com - (866) > 399 8500 > The FreeBSD Project - http://www.freebsd.org > Pave the road of life with opportunities. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 05:07:58 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7182916A417; Sat, 22 Dec 2007 05:07:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 1354213C448; Sat, 22 Dec 2007 05:07:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J5waY-000KaF-Ky; Sat, 22 Dec 2007 07:07:57 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBM57hHT070611; Sat, 22 Dec 2007 07:07:43 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBM57hwm070610; Sat, 22 Dec 2007 07:07:43 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 07:07:43 +0200 From: Kostik Belousov To: David Schwartz Message-ID: <20071222050743.GP57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TRkqPRiqIDKgfg/F" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 1118cf21f7425d415ed61e91afc5cb63 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1938 [Dec 21 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: "Freebsd-Net@Freebsd. Org" , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 05:07:58 -0000 --TRkqPRiqIDKgfg/F Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 21, 2007 at 05:43:09PM -0800, David Schwartz wrote: >=20 >=20 > I'm just an observer, and I may be confused, but it seems to me that this= is > motion in the wrong direction (at least, it's not going to fix the actual > problem). As I understand the problem, once you reach a certain point, the > system slows down *every* 30.999 seconds. Now, it's possible for the code= to > cause one slowdown as it cleans up, but why does it need to clean up so m= uch > 31 seconds later? >=20 > Why not find/fix the actual bug? Then work on getting the yield right if = it > turns out there's an actual problem for it to fix. >=20 > If the problem is that too much work is being done at a stretch and it tu= rns > out this is because work is being done erroneously or needlessly, fixing > that should solve the whole problem. Doing the work that doesn't need to = be > done more slowly is at best an ugly workaround. >=20 > Or am I misunderstanding? Yes, rewriting the syncer is the right solution. It probably cannot be done quickly enough. If the yield workaround provide mitigation for now, it shall go in. --TRkqPRiqIDKgfg/F Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbJueC3+MBN1Mb4gRAvPyAJ9Zp0lEBJmQvkFNRhu2hq/ABVh4qACfc8C0 K4g5W+0PuhHCJNCG9GrUwpw= =Hb5f -----END PGP SIGNATURE----- --TRkqPRiqIDKgfg/F-- From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 05:37:00 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A469716A419; Sat, 22 Dec 2007 05:37:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 4340813C457; Sat, 22 Dec 2007 05:37:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J5x2a-000Pvz-KS; Sat, 22 Dec 2007 07:36:59 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBM5anNG082478; Sat, 22 Dec 2007 07:36:49 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBM5amIA082477; Sat, 22 Dec 2007 07:36:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 07:36:48 +0200 From: Kostik Belousov To: Mark Fullmer Message-ID: <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ubuMVesmirrCclZT" Content-Disposition: inline In-Reply-To: <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: b22dd54e5e410b0526d530b08df0ebb5 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1938 [Dec 21 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, Alfred Perlstein , freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 05:37:00 -0000 --ubuMVesmirrCclZT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: > The uio_yield() idea did not work. Still have the same 31 second =20 > interval packet loss. What patch you have used ? Lets check whether the syncer is the culprit for you. Please, change the value of the syncdelay at the sys/kern/vfs_subr.c around the line 238 from 30 to some other value, e.g., 45. After that, check the interval of the effect you have observed. It would be interesting to check whether completely disabling the syncer eliminates the packet loss, but such system have to be operated with extreme caution. >=20 > Is it safe to assume the vp will be valid after a msleep() or =20 > uio_yield()? If No. --ubuMVesmirrCclZT Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbKJvC3+MBN1Mb4gRAnz/AKDTPhR99Hw1sOHoYcE66Zq2MNZe5ACeMB/H BzkSI7Ud0ro/w6gCAAvxSpY= =XQ4k -----END PGP SIGNATURE----- --ubuMVesmirrCclZT-- From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 05:37:40 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1996D16A419; Sat, 22 Dec 2007 05:37:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 9F86413C43E; Sat, 22 Dec 2007 05:37:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J5x3I-00003U-CG; Sat, 22 Dec 2007 07:37:38 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBM5bXmS082494; Sat, 22 Dec 2007 07:37:33 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBM5bW0G082493; Sat, 22 Dec 2007 07:37:32 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 07:37:32 +0200 From: Kostik Belousov To: Alfred Perlstein Message-ID: <20071222053732.GR57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <20071222002432.GK16982@elvis.mu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="soWJpSPh+l8Y6Fy7" Content-Disposition: inline In-Reply-To: <20071222002432.GK16982@elvis.mu.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: f2683d8df775da069012ccac6e6e6321 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1938 [Dec 21 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org, David G Lawrence Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 05:37:40 -0000 --soWJpSPh+l8Y6Fy7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 21, 2007 at 04:24:32PM -0800, Alfred Perlstein wrote: > * David G Lawrence [071221 15:42] wrote: > > > > Unfortunately, the version of the patch that I sent out isn't go= ing to > > > > help your problem. It needs to yield at the top of the loop, but vp= isn't > > > > necessarily valid after the wakeup from the msleep. That's a proble= m that > > > > I'm having trouble figuring out a solution to - the solutions that = come > > > > to mind will all significantly increase the overhead of the loop. > > >=20 > > > I apologize for not reading the code as I am swamped, but a technique > > > that Matt Dillon used for bufs might work here. > > >=20 > > > Can you use a placeholder vnode as a place to restart the scan? > > > you might have to mark it special so that other threads/things > > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > > restart point. > >=20 > > That was one of the solutions that I considered and rejected since it > > would significantly increase the overhead of the loop. > > The solution provided by Kostik Belousov that uses uio_yield looks l= ike > > a find solution. I intend to try it out on some servers RSN. >=20 > Out of curiosity's sake, why would it make the loop slower? one > would only add the placeholder when yielding, not for every iteration. Marker is already reinserted into the list on every iteration. --soWJpSPh+l8Y6Fy7 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbKKcC3+MBN1Mb4gRAoT1AKCBmHGiMjE36UzMRadhMvV4puYuSwCfQby3 vGqKmgClkrEd3/x3ytBSUao= =HhMC -----END PGP SIGNATURE----- --soWJpSPh+l8Y6Fy7-- From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 06:28:50 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 001AD16A418 for ; Sat, 22 Dec 2007 06:28:49 +0000 (UTC) (envelope-from maf@splintered.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id B5D3B13C4F2 for ; Sat, 22 Dec 2007 06:28:49 +0000 (UTC) (envelope-from maf@splintered.net) Received: (qmail 82942 invoked from network); 22 Dec 2007 06:28:49 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 22 Dec 2007 06:28:49 -0000 In-Reply-To: <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Sat, 22 Dec 2007 01:28:31 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 06:28:50 -0000 On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: > On Fri, Dec 21, 2007 at 10:30:51PM -0500, Mark Fullmer wrote: >> The uio_yield() idea did not work. Still have the same 31 second >> interval packet loss. > What patch you have used ? This is hand applied from the diff you sent December 19, 2007 1:24:48 PM EST sr1400-ar0.eng:/usr/src/sys/ufs/ffs# diff -c ffs_vfsops.c ffs_vfsops.c.orig *** ffs_vfsops.c Fri Dec 21 21:08:39 2007 --- ffs_vfsops.c.orig Sat Dec 22 00:51:22 2007 *************** *** 1107,1113 **** struct ufsmount *ump = VFSTOUFS(mp); struct fs *fs; int error, count, wait, lockreq, allerror = 0; - int yield_count; int suspend; int suspended; int secondary_writes; --- 1107,1112 ---- *************** *** 1148,1154 **** softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); - yield_count = 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough --- 1147,1152 ---- *************** *** 1166,1177 **** (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && vp->v_bufobj.bo_dirty.bv_cnt == 0)) { VI_UNLOCK(vp); - if (yield_count++ == 100) { - MNT_IUNLOCK(mp); - yield_count = 0; - uio_yield(); - goto relock_mp; - } continue; } MNT_IUNLOCK(mp); --- 1164,1169 ---- *************** *** 1186,1192 **** if ((error = ffs_syncvnode(vp, waitfor)) != 0) allerror = error; vput(vp); - relock_mp: MNT_ILOCK(mp); } MNT_IUNLOCK(mp); --- 1178,1183 ---- > > Lets check whether the syncer is the culprit for you. > Please, change the value of the syncdelay at the sys/kern/vfs_subr.c > around the line 238 from 30 to some other value, e.g., 45. After that, > check the interval of the effect you have observed. Changed it to 13. Not sure if SYNCER_MAXDELAY also needed to be increased if syncdelay was increased. static int syncdelay = 13; /* max time to delay syncing data */ Test: ; use vnodes % find / -type f -print > /dev/null ; verify % sysctl vfs.numvnodes vfs.numvnodes: 32128 ; run packet loss test now have periodic loss every 13994633us (13.99 seconds). ; reduce # of vnodes with sysctl kern.maxvnodes=1000 test now runs clean. > > It would be interesting to check whether completely disabling the > syncer > eliminates the packet loss, but such system have to be operated with > extreme caution. > From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 06:50:30 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BDC5F16A469; Sat, 22 Dec 2007 06:50:30 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 933C413C4D5; Sat, 22 Dec 2007 06:50:30 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBM6oURH017372; Fri, 21 Dec 2007 22:50:30 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBM6oTKL017371; Fri, 21 Dec 2007 22:50:29 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 22:50:29 -0800 From: David G Lawrence To: Mark Fullmer Message-ID: <20071222065029.GT25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 22:50:30 -0800 (PST) Cc: Kostik Belousov , freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 06:50:30 -0000 > >What patch you have used ? > > This is hand applied from the diff you sent December 19, 2007 1:24:48 > PM EST Mark, try the previos patch from Kostik - the one that does the one tick msleep. I think you'll find that that one does work. The likely problem with the second version is that uio_yield doesn't lower the priority enough for the other threads to run. Forcing it to msleep for a tick will eliminate the priority from the consideration. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 07:03:27 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4AA816A41B; Sat, 22 Dec 2007 07:03:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 7B65F13C461; Sat, 22 Dec 2007 07:03:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J5yOH-000HOX-D6; Sat, 22 Dec 2007 09:03:26 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBM73IMb030376; Sat, 22 Dec 2007 09:03:18 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBM73IJ3030375; Sat, 22 Dec 2007 09:03:18 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 09:03:18 +0200 From: Kostik Belousov To: Mark Fullmer Message-ID: <20071222070318.GT57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="N7HXVILz59yg1nI8" Content-Disposition: inline In-Reply-To: <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: ebef06da57613c13a9510a9a5e517e93 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1938 [Dec 21 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 07:03:28 -0000 --N7HXVILz59yg1nI8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 22, 2007 at 01:28:31AM -0500, Mark Fullmer wrote: >=20 > On Dec 22, 2007, at 12:36 AM, Kostik Belousov wrote: > >Lets check whether the syncer is the culprit for you. > >Please, change the value of the syncdelay at the sys/kern/vfs_subr.c > >around the line 238 from 30 to some other value, e.g., 45. After that, > >check the interval of the effect you have observed. >=20 > Changed it to 13. Not sure if SYNCER_MAXDELAY also needed to be > increased if syncdelay was increased. >=20 > static int syncdelay =3D 13; /* max time to delay syncing = =20 > data */ >=20 > Test: >=20 > ; use vnodes > % find / -type f -print > /dev/null >=20 > ; verify > % sysctl vfs.numvnodes > vfs.numvnodes: 32128 >=20 > ; run packet loss test > now have periodic loss every 13994633us (13.99 seconds). >=20 > ; reduce # of vnodes with sysctl kern.maxvnodes=3D1000 > test now runs clean. Definitely syncer.=20 > > > >It would be interesting to check whether completely disabling the =20 > >syncer > >eliminates the packet loss, but such system have to be operated with > >extreme caution. Ok, no need to do this. As Bruce Evans noted, there is a vfs_msync() that do almost the same traversal of the vnodes. It was missed in the previous patch. Try this one. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 3c2e1ed..6515d6a 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -2967,7 +2967,9 @@ vfs_msync(struct mount *mp, int flags) { struct vnode *vp, *mvp; struct vm_object *obj; + int yield_count; =20 + yield_count =3D 0; MNT_ILOCK(mp); MNT_VNODE_FOREACH(vp, mp, mvp) { VI_LOCK(vp); @@ -2996,6 +2998,12 @@ vfs_msync(struct mount *mp, int flags) MNT_ILOCK(mp); } else VI_UNLOCK(vp); + if (yield_count++ =3D=3D 500) { + MNT_IUNLOCK(mp); + yield_count =3D 0; + uio_yield(); + MNT_ILOCK(mp); + } } MNT_IUNLOCK(mp); } diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c index cbccc62..9e8b887 100644 --- a/sys/ufs/ffs/ffs_vfsops.c +++ b/sys/ufs/ffs/ffs_vfsops.c @@ -1182,6 +1182,7 @@ ffs_sync(mp, waitfor, td) int secondary_accwrites; int softdep_deps; int softdep_accdeps; + int yield_count; struct bufobj *bo; =20 fs =3D ump->um_fs; @@ -1216,6 +1217,7 @@ loop: softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); MNT_ILOCK(mp); =20 + yield_count =3D 0; MNT_VNODE_FOREACH(vp, mp, mvp) { /* * Depend on the mntvnode_slock to keep things stable enough @@ -1233,6 +1235,12 @@ loop: (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) =3D=3D 0 && vp->v_bufobj.bo_dirty.bv_cnt =3D=3D 0)) { VI_UNLOCK(vp); + if (yield_count++ =3D=3D 500) { + MNT_IUNLOCK(mp); + yield_count =3D 0; + uio_yield(); + MNT_ILOCK(mp); + } continue; } MNT_IUNLOCK(mp); --N7HXVILz59yg1nI8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbLa1C3+MBN1Mb4gRAjZAAKDOZCUCmfjbFX61IvwpSDfMg8dTCgCgqxLE LsxaM+dv/WP5wHW2z1lYJZ8= =yi5I -----END PGP SIGNATURE----- --N7HXVILz59yg1nI8-- From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 07:09:38 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 391DD16A46C; Sat, 22 Dec 2007 07:09:38 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1316413C458; Sat, 22 Dec 2007 07:09:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBM79b5I029206; Fri, 21 Dec 2007 23:09:37 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBM79bC6029205; Fri, 21 Dec 2007 23:09:37 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 23:09:37 -0800 From: David G Lawrence To: David Schwartz Message-ID: <20071222070937.GU25053@tnn.dglawrence.com> References: <20071221234347.GS25053@tnn.dglawrence.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 23:09:37 -0800 (PST) Cc: "Freebsd-Net@Freebsd. Org" , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 07:09:38 -0000 > I'm just an observer, and I may be confused, but it seems to me that this is > motion in the wrong direction (at least, it's not going to fix the actual > problem). As I understand the problem, once you reach a certain point, the > system slows down *every* 30.999 seconds. Now, it's possible for the code to > cause one slowdown as it cleans up, but why does it need to clean up so much > 31 seconds later? > > Why not find/fix the actual bug? Then work on getting the yield right if it > turns out there's an actual problem for it to fix. > > If the problem is that too much work is being done at a stretch and it turns > out this is because work is being done erroneously or needlessly, fixing > that should solve the whole problem. Doing the work that doesn't need to be > done more slowly is at best an ugly workaround. > > Or am I misunderstanding? It's the syncer that is causing the problem, and it runs every 31 seconds. Historically, the syncer ran every 30 seconds, but things have changed a bit over time. The reason that the syncer takes so muck time is that ffs_sync is a bit stupid in how it works - it loops through all of the vnodes on each ffs mountpoint (typically almost all of the vnodes in the system) to see if any of them need to be synced out. This was marginally okay when there were perhaps a thousand vnodes in the system, but when the maximum number of vnodes was dramatically increased in FreeBSD some years ago (to typically 50000-100000) and combined with kernel threads of FreeBSD 5, this has resulted in some rather bad side effects. I think the proper solution would be to create a ffs_sync work list (another TAILQ/LISTQ), probably with the head in the mountpoint struct, that has on it any vnodes that need to be synced. Unfortuantely, such a change would be extensive, scattered throughout much of the ufs/ffs code. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 07:15:17 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C055C16A41B; Sat, 22 Dec 2007 07:15:17 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 97BCE13C461; Sat, 22 Dec 2007 07:15:17 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBM7FHrJ033326; Fri, 21 Dec 2007 23:15:17 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBM7FH0c033325; Fri, 21 Dec 2007 23:15:17 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 23:15:17 -0800 From: David G Lawrence To: Kostik Belousov Message-ID: <20071222071517.GV25053@tnn.dglawrence.com> References: <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> <20071222070318.GT57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071222070318.GT57756@deviant.kiev.zoral.com.ua> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 23:15:17 -0800 (PST) Cc: Mark Fullmer , freebsd-stable@freebsd.org, freebsd-net@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 07:15:17 -0000 > As Bruce Evans noted, there is a vfs_msync() that do almost the same > traversal of the vnodes. It was missed in the previous patch. Try this one. I forgot to comment on that when Bruce pointed that out. My solution has been to comment out the call to vfs_msync. :-) It comes into play when you have files modified through the mmap interface (kind of rare on most systems). Obviously I have mixed feelings about vfs_msync, but I'm not suggesting here that we should get rid of it as any sort of solution. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 07:32:37 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5999716A41B; Sat, 22 Dec 2007 07:32:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (static-72-90-113-2.ptldor.fios.verizon.net [72.90.113.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2214113C447; Sat, 22 Dec 2007 07:32:37 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id lBM7WaXn044068; Fri, 21 Dec 2007 23:32:36 -0800 (PST) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id lBM7WaHO044067; Fri, 21 Dec 2007 23:32:36 -0800 (PST) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Fri, 21 Dec 2007 23:32:36 -0800 From: David G Lawrence To: Alfred Perlstein Message-ID: <20071222073236.GW25053@tnn.dglawrence.com> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <20071222002432.GK16982@elvis.mu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071222002432.GK16982@elvis.mu.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Fri, 21 Dec 2007 23:32:36 -0800 (PST) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 07:32:37 -0000 > > > Can you use a placeholder vnode as a place to restart the scan? > > > you might have to mark it special so that other threads/things > > > (getnewvnode()?) don't molest it, but it can provide for a convenient > > > restart point. > > > > That was one of the solutions that I considered and rejected since it > > would significantly increase the overhead of the loop. > > The solution provided by Kostik Belousov that uses uio_yield looks like > > a find solution. I intend to try it out on some servers RSN. > > Out of curiosity's sake, why would it make the loop slower? one > would only add the placeholder when yielding, not for every iteration. Actually, I misread your suggestion and was thinking marker flag, rather than placeholder vnode. Sorry about that. The current code actually already uses a marker vnode. It is hidden and obfuscated in the MNT_VNODE_FOREACH macro, further hidden in the __mnt_vnode_first/next functions, so it should be safe from vnode reclaimation/free problems. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 17:08:15 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 509EF16A418; Sat, 22 Dec 2007 17:08:15 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id EA4AA13C46B; Sat, 22 Dec 2007 17:08:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBMH897M022272 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Dec 2007 04:08:11 +1100 Date: Sun, 23 Dec 2007 04:08:09 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kostik Belousov In-Reply-To: <20071222050743.GP57756@deviant.kiev.zoral.com.ua> Message-ID: <20071223032944.G48303@delplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "Freebsd-Net@Freebsd. Org" , David Schwartz , freebsd-stable@FreeBSD.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 17:08:15 -0000 On Sat, 22 Dec 2007, Kostik Belousov wrote: > On Fri, Dec 21, 2007 at 05:43:09PM -0800, David Schwartz wrote: >> >> I'm just an observer, and I may be confused, but it seems to me that this is >> motion in the wrong direction (at least, it's not going to fix the actual >> problem). As I understand the problem, once you reach a certain point, the >> system slows down *every* 30.999 seconds. Now, it's possible for the code to >> cause one slowdown as it cleans up, but why does it need to clean up so much >> 31 seconds later? It is just searching for things to clean up, and doing this pessimally due to unnecessary cache misses and (more recently) introduction of overheads to handling the case where the mount point is locked into the fast path where the mount point is not unlocked. The search every 30 seconds or so is probably more efficient, and is certainly simpler, than managing the list on every change to every vnode for every file system. However, it gives a high latency in non-preemptible kernels. >> Why not find/fix the actual bug? Then work on getting the yield right if it >> turns out there's an actual problem for it to fix. Yielding is probably the correct fix for non-preemptible kernels. Some operations just take a long time, but are low priority so they can be preempted. This operation is partly under user control, since any user can call sync(2) and thus generate the latency every seconds. But this is no worse than a user generating even larger blocks of latency by reading huge amounts from /dev/zero. My old latency workaround for the latter (and other huge i/o's) is still sort of necessary, though it now works bogusly (hogticks doesn't work since it is reset on context switches to interrupt handlers; however, any context switch mostly fixes the problem). My old latency workaround only reduces the latency to a multiple of 1/HZ, so a default of 200 ms, so it still is supposed to allow latencies much larger than the ones that cause problems here, but its bogus current operation tends to give latencies of more like 1/HZ which is short enough when HZ has its default misconfiguration to 1000. I still don't understand the original problem, that the kernel is not even preemptible enough for network interrupts to work (except in 5.2 where Giant breaks things). Perhaps I misread the problem, and it is actually that networking works but userland is unable to run in time to avoid packet loss. >> If the problem is that too much work is being done at a stretch and it turns >> out this is because work is being done erroneously or needlessly, fixing >> that should solve the whole problem. Doing the work that doesn't need to be >> done more slowly is at best an ugly workaround. Lots of necessary work is being done. > Yes, rewriting the syncer is the right solution. It probably cannot be done > quickly enough. If the yield workaround provide mitigation for now, it > shall go in. I don't think rewriting the syncer just for this is the right solution. Rewriting the syncer so that it schedules actual i/o more efficiently might involve a solution. Better scheduling would probably take more CPU and increase the problem. Note that MNT_VNODE_FOREACH() is used 17 times, so the yielding fix is needed in 17 places if it isn't done internally in MNT_VNODE_FOREACH(). There are 4 places in vfs and 13 places in 6 file systems: % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(xvp, mp, mvp) { % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./fs/msdosfs/msdosfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, nvp) { % ./fs/coda/coda_subr.c: MNT_VNODE_FOREACH(vp, mp, nvp) { % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./kern/vfs_default.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./nfs4client/nfs4_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { % ./nfsclient/nfs_subs.c: MNT_VNODE_FOREACH(vp, mp, nvp) { % ./nfsclient/nfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { Only file systems that support writing need it (for VOP_SYNC() and for MNT_RELOAD), else there would be many more places. There would also be more places if MNT_RELOAD support were not missing for some file systems. Bruce From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 18:02:29 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E3FC16A420 for ; Sat, 22 Dec 2007 18:02:29 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id 36C8313C468 for ; Sat, 22 Dec 2007 18:02:29 +0000 (UTC) (envelope-from maf@eng.oar.net) Received: (qmail 7917 invoked from network); 22 Dec 2007 18:02:28 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 22 Dec 2007 18:02:28 -0000 In-Reply-To: <20071223032944.G48303@delplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <985A3F99-B3F4-451E-BD77-E2EB4351E323@eng.oar.net> Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Sat, 22 Dec 2007 13:02:12 -0500 To: Bruce Evans X-Mailer: Apple Mail (2.752.3) Cc: Kostik Belousov , freebsd-net@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 18:02:29 -0000 On Dec 22, 2007, at 12:08 PM, Bruce Evans wrote: > > I still don't understand the original problem, that the kernel is not > even preemptible enough for network interrupts to work (except in 5.2 > where Giant breaks things). Perhaps I misread the problem, and it is > actually that networking works but userland is unable to run in time > to avoid packet loss. > The test is done with UDP packets between two servers. The em driver is incrementing the received packet count correctly but the packet is not making it up the network stack. If the application was not servicing the socket fast enough I would expect to see the "dropped due to full socket buffers" (udps_fullsock) counter incrementing, as shown by netstat -s. I grab a copy of netstat -s, netstat -i, and netstat -m before and after testing. Other than the link packets counter, I haven't seen any other indication of where the packet is getting lost. The em driver has a debugging stats option which does not indicate receive side overflows. I'm fairly certain this same behavior can be seen with the fxp driver, but I'll need to double check. These are results I sent a few days ago after setting up a test without an ethernet switch between the sender and receiver. The switch was originally used to verify the sender was actually transmitting. With spanning tree, ethernet keepalives, and CDP (cisco proprietary neighbor protocol) disabled and static ARP entries on the sender and receiver I can account for all packets making it to the receiver. ## > Back to back test with no ethernet switch between two em interfaces, > same result. The receiving side has been up > 1 day and exhibits > the problem. These are also two different servers. The small > gettimeofday() syscall tester also shows the same ~30 > second pattern of high latency between syscalls. > > Receiver test application reports 3699 missed packets > > Sender netstat -i: > > (before test) > em1 1500 00:04:23:cf:51:b7 20 0 > 15975785 0 0 > em1 1500 10.1/24 10.1.0.2 37 - > 15975801 - - > > (after test) > em1 1500 00:04:23:cf:51:b7 22 0 > 25975822 0 0 > em1 1500 10.1/24 10.1.0.2 39 - > 25975838 - - > > total IP packets sent in during test = end - start > 25975838-15975801 = 10000037 (expected, 1,000,000 packets test + > overhead) > > Receiver netstat -i: > > (before test) > em1 1500 00:04:23:c4:cc:89 15975785 0 > 21 0 0 > em1 1500 10.1/24 10.1.0.1 15969626 - > 19 - - > > (after test) > em1 1500 00:04:23:c4:cc:89 25975822 0 > 23 0 0 > em1 1500 10.1/24 10.1.0.1 25965964 - > 21 - - > > total ethernet frames received during test = end - start > 25975822-15975785 = 10000037 (as expected) > > total IP packets processed during test = end - start > 25965964-15969626 = 9996338 (expecting 10000037) > > Missed packets = expected - received > 10000037-9996338 = 3699 > > netstat -i accounts for the 3699 missed packets also reported by the > application > > Looking closer at the tester output again shows the periodic > ~30 second windows of packet loss. > > There's a second problem here in that packets are just disappearing > before they make it to ip_input(), or there's a dropped packets > counter I've not found yet. > > I can provide remote access to anyone who wants to take a look, this > is very easy to duplicate. The ~ 1 day uptime before the behavior > surfaces is not making this easy to isolate. From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 18:23:40 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 257A916A420 for ; Sat, 22 Dec 2007 18:23:40 +0000 (UTC) (envelope-from maf@splintered.net) Received: from sv1.eng.oar.net (sv1.eng.oar.net [192.148.251.86]) by mx1.freebsd.org (Postfix) with SMTP id D7B0F13C447 for ; Sat, 22 Dec 2007 18:23:39 +0000 (UTC) (envelope-from maf@splintered.net) Received: (qmail 11183 invoked from network); 22 Dec 2007 18:23:38 -0000 Received: from dev1.eng.oar.net (HELO ?127.0.0.1?) (192.148.251.71) by sv1.eng.oar.net with SMTP; 22 Dec 2007 18:23:38 -0000 In-Reply-To: <20071222070318.GT57756@deviant.kiev.zoral.com.ua> References: <20071217102433.GQ25053@tnn.dglawrence.com> <20071220011626.U928@besplex.bde.org> <814DB7A9-E64F-4BCA-A502-AB5A6E0297D3@eng.oar.net> <20071219171331.GH25053@tnn.dglawrence.com> <20071221200810.GY16982@elvis.mu.org> <20071221234347.GS25053@tnn.dglawrence.com> <6D374B4C-0D98-4916-A762-7A85912B3058@splintered.net> <20071222053648.GQ57756@deviant.kiev.zoral.com.ua> <3647BB78-BA10-432B-A52B-04E402E155CC@splintered.net> <20071222070318.GT57756@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Mark Fullmer Date: Sat, 22 Dec 2007 13:23:22 -0500 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 18:23:40 -0000 This appears to work. No packet loss with vfs.numvnodes at 32132, 16K PPS test with 1 million packets. I'll run some additional tests bringing vfs.numvnodes closer to kern.maxvnodes. On Dec 22, 2007, at 2:03 AM, Kostik Belousov wrote: > > As Bruce Evans noted, there is a vfs_msync() that do almost the same > traversal of the vnodes. It was missed in the previous patch. Try > this one. > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 3c2e1ed..6515d6a 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -2967,7 +2967,9 @@ vfs_msync(struct mount *mp, int flags) > { > struct vnode *vp, *mvp; > struct vm_object *obj; > + int yield_count; > > + yield_count = 0; > MNT_ILOCK(mp); > MNT_VNODE_FOREACH(vp, mp, mvp) { > VI_LOCK(vp); > @@ -2996,6 +2998,12 @@ vfs_msync(struct mount *mp, int flags) > MNT_ILOCK(mp); > } else > VI_UNLOCK(vp); > + if (yield_count++ == 500) { > + MNT_IUNLOCK(mp); > + yield_count = 0; > + uio_yield(); > + MNT_ILOCK(mp); > + } > } > MNT_IUNLOCK(mp); > } > diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c > index cbccc62..9e8b887 100644 > --- a/sys/ufs/ffs/ffs_vfsops.c > +++ b/sys/ufs/ffs/ffs_vfsops.c > @@ -1182,6 +1182,7 @@ ffs_sync(mp, waitfor, td) > int secondary_accwrites; > int softdep_deps; > int softdep_accdeps; > + int yield_count; > struct bufobj *bo; > > fs = ump->um_fs; > @@ -1216,6 +1217,7 @@ loop: > softdep_get_depcounts(mp, &softdep_deps, &softdep_accdeps); > MNT_ILOCK(mp); > > + yield_count = 0; > MNT_VNODE_FOREACH(vp, mp, mvp) { > /* > * Depend on the mntvnode_slock to keep things stable enough > @@ -1233,6 +1235,12 @@ loop: > (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 && > vp->v_bufobj.bo_dirty.bv_cnt == 0)) { > VI_UNLOCK(vp); > + if (yield_count++ == 500) { > + MNT_IUNLOCK(mp); > + yield_count = 0; > + uio_yield(); > + MNT_ILOCK(mp); > + } > continue; > } > MNT_IUNLOCK(mp); From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 20:16:26 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A45D16A417; Sat, 22 Dec 2007 20:16:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay02.kiev.sovam.com (relay02.kiev.sovam.com [62.64.120.197]) by mx1.freebsd.org (Postfix) with ESMTP id 2441113C455; Sat, 22 Dec 2007 20:16:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=deviant.kiev.zoral.com.ua) by relay02.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1J6Ald-000LAK-OO; Sat, 22 Dec 2007 22:16:24 +0200 Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id lBMKGE83034471; Sat, 22 Dec 2007 22:16:14 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id lBMKGEiI034470; Sat, 22 Dec 2007 22:16:14 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Dec 2007 22:16:13 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20071222201613.GX57756@deviant.kiev.zoral.com.ua> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0k4Rxg87Lb8yV0u3" Content-Disposition: inline In-Reply-To: <20071223032944.G48303@delplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Scanner-Signature: 69ad68b945e79e087f1af101bf3fefec X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 1946 [Dec 22 2007] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: "Freebsd-Net@Freebsd. Org" , David Schwartz , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 20:16:26 -0000 --0k4Rxg87Lb8yV0u3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote: > On Sat, 22 Dec 2007, Kostik Belousov wrote: > >Yes, rewriting the syncer is the right solution. It probably cannot be d= one > >quickly enough. If the yield workaround provide mitigation for now, it > >shall go in. >=20 > I don't think rewriting the syncer just for this is the right solution. > Rewriting the syncer so that it schedules actual i/o more efficiently > might involve a solution. Better scheduling would probably take more > CPU and increase the problem. I think that we can easily predict what vnode(s) become dirty at the places where we do vn_start_write(). >=20 > Note that MNT_VNODE_FOREACH() is used 17 times, so the yielding fix is > needed in 17 places if it isn't done internally in MNT_VNODE_FOREACH(). > There are 4 places in vfs and 13 places in 6 file systems: >=20 > % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(xvp, mp, mvp) { > % ./ufs/ffs/ffs_snapshot.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ffs/ffs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./ufs/ufs/ufs_quota.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./fs/msdosfs/msdosfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./fs/coda/coda_subr.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./gnu/fs/ext2fs/ext2_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_default.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./kern/vfs_subr.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./nfs4client/nfs4_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { > % ./nfsclient/nfs_subs.c: MNT_VNODE_FOREACH(vp, mp, nvp) { > % ./nfsclient/nfs_vfsops.c: MNT_VNODE_FOREACH(vp, mp, mvp) { >=20 > Only file systems that support writing need it (for VOP_SYNC() and for > MNT_RELOAD), else there would be many more places. There would also > be more places if MNT_RELOAD support were not missing for some file > systems. Ok, since you talked about this first :). I already made the following patch, but did not published it since I still did not inspected all callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock. It shall be safe, but better to check. Also, I postponed the check until it was reported that yielding does solve the original problem. diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index 14acc5b..046af82 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -1994,6 +1994,12 @@ __mnt_vnode_next(struct vnode **mvp, struct mount *m= p) mtx_assert(MNT_MTX(mp), MA_OWNED); =20 KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); + if ((*mvp)->v_yield++ =3D=3D 500) { + MNT_IUNLOCK(mp); + (*mvp)->v_yield =3D 0; + uio_yield(); + MNT_ILOCK(mp); + } vp =3D TAILQ_NEXT(*mvp, v_nmntvnodes); while (vp !=3D NULL && vp->v_type =3D=3D VMARKER) vp =3D TAILQ_NEXT(vp, v_nmntvnodes); diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index dc70417..6e3119b 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -131,6 +131,7 @@ struct vnode { struct socket *vu_socket; /* v unix domain net (VSOCK) */ struct cdev *vu_cdev; /* v device (VCHR, VBLK) */ struct fifoinfo *vu_fifoinfo; /* v fifo (VFIFO) */ + int vu_yield; /* yield count (VMARKER) */ } v_un; =20 /* @@ -185,6 +186,7 @@ struct vnode { #define v_socket v_un.vu_socket #define v_rdev v_un.vu_cdev #define v_fifoinfo v_un.vu_fifoinfo +#define v_yield v_un.vu_yield =20 /* XXX: These are temporary to avoid a source sweep at this time */ #define v_object v_bufobj.bo_object --0k4Rxg87Lb8yV0u3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHbXCNC3+MBN1Mb4gRAl8eAJ9DvGYrFBcvBUeaesQfI8K8NZa8CwCfabpZ P1ojIQjyRhEbd8gCeutenLM= =t5ni -----END PGP SIGNATURE----- --0k4Rxg87Lb8yV0u3-- From owner-freebsd-net@FreeBSD.ORG Sat Dec 22 23:20:37 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6AE3016A417; Sat, 22 Dec 2007 23:20:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 143A113C44B; Sat, 22 Dec 2007 23:20:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lBMNKVRO030159 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Dec 2007 10:20:33 +1100 Date: Sun, 23 Dec 2007 10:20:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kostik Belousov In-Reply-To: <20071222201613.GX57756@deviant.kiev.zoral.com.ua> Message-ID: <20071223095314.G1323@delplex.bde.org> References: <20071221234347.GS25053@tnn.dglawrence.com> <20071222050743.GP57756@deviant.kiev.zoral.com.ua> <20071223032944.G48303@delplex.bde.org> <20071222201613.GX57756@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "Freebsd-Net@Freebsd. Org" , David Schwartz , freebsd-stable@freebsd.org Subject: Re: Packet loss every 30.999 seconds X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Dec 2007 23:20:37 -0000 On Sat, 22 Dec 2007, Kostik Belousov wrote: > On Sun, Dec 23, 2007 at 04:08:09AM +1100, Bruce Evans wrote: >> On Sat, 22 Dec 2007, Kostik Belousov wrote: >>> Yes, rewriting the syncer is the right solution. It probably cannot be done >>> quickly enough. If the yield workaround provide mitigation for now, it >>> shall go in. >> >> I don't think rewriting the syncer just for this is the right solution. >> Rewriting the syncer so that it schedules actual i/o more efficiently >> might involve a solution. Better scheduling would probably take more >> CPU and increase the problem. > I think that we can easily predict what vnode(s) become dirty at the > places where we do vn_start_write(). This works for writes to regular files at most. There are also reads (for ffs, these set IN_ATIME unless the file system is mounted with noatime) and directory operations. By grepping for IN_CHANGE, I get 78 places in ffs alone where dirtying of the inode occurs or is scheduled to occur (ffs = /sys/ufs). The efficiency of "marking" timestamps, especially for atimes, depends on just setting a flag in normal operation and picking up coalesced settings of the flag later, often at sync time by scanning all vnodes. >> Note that MNT_VNODE_FOREACH() is used 17 times, so the yielding fix is >> needed in 17 places if it isn't done internally in MNT_VNODE_FOREACH(). >> There are 4 places in vfs and 13 places in 6 file systems: >> ... >> >> Only file systems that support writing need it (for VOP_SYNC() and for >> MNT_RELOAD), else there would be many more places. There would also >> be more places if MNT_RELOAD support were not missing for some file >> systems. > > Ok, since you talked about this first :). I already made the following > patch, but did not published it since I still did not inspected all > callers of MNT_VNODE_FOREACH() for safety of dropping mount interlock. > It shall be safe, but better to check. Also, I postponed the check > until it was reported that yielding does solve the original problem. Good. I'd still like to unobfuscate the function call. > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c > index 14acc5b..046af82 100644 > --- a/sys/kern/vfs_mount.c > +++ b/sys/kern/vfs_mount.c > @@ -1994,6 +1994,12 @@ __mnt_vnode_next(struct vnode **mvp, struct mount *mp) > mtx_assert(MNT_MTX(mp), MA_OWNED); > > KASSERT((*mvp)->v_mount == mp, ("marker vnode mount list mismatch")); > + if ((*mvp)->v_yield++ == 500) { > + MNT_IUNLOCK(mp); > + (*mvp)->v_yield = 0; > + uio_yield(); Another unobfuscation is to not name this uio_yield(). > + MNT_ILOCK(mp); > + } > vp = TAILQ_NEXT(*mvp, v_nmntvnodes); > while (vp != NULL && vp->v_type == VMARKER) > vp = TAILQ_NEXT(vp, v_nmntvnodes); > diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h > index dc70417..6e3119b 100644 > --- a/sys/sys/vnode.h > +++ b/sys/sys/vnode.h > @@ -131,6 +131,7 @@ struct vnode { > struct socket *vu_socket; /* v unix domain net (VSOCK) */ > struct cdev *vu_cdev; /* v device (VCHR, VBLK) */ > struct fifoinfo *vu_fifoinfo; /* v fifo (VFIFO) */ > + int vu_yield; /* yield count (VMARKER) */ > } v_un; > > /* Putting the count in the union seems fragile at best. Even if nothing can access the marker vnode, you need to context-switch its old contents while using it for the count, in case its old contents is used. Vnode- printing routines might still be confused. Bruce