From owner-freebsd-net@FreeBSD.ORG Fri Jul 8 18:07:13 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F9C81065673; Fri, 8 Jul 2011 18:07:13 +0000 (UTC) (envelope-from davidch@broadcom.com) Received: from mms1.broadcom.com (mms1.broadcom.com [216.31.210.17]) by mx1.freebsd.org (Postfix) with ESMTP id 652C68FC17; Fri, 8 Jul 2011 18:07:13 +0000 (UTC) Received: from [10.9.200.133] by mms1.broadcom.com with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Fri, 08 Jul 2011 11:05:28 -0700 X-Server-Uuid: 02CED230-5797-4B57-9875-D5D2FEE4708A Received: from IRVEXCHCCR01.corp.ad.broadcom.com ([10.252.49.30]) by IRVEXCHHUB02.corp.ad.broadcom.com ([10.9.200.133]) with mapi; Fri, 8 Jul 2011 11:00:11 -0700 From: "David Christensen" To: "Charles Sprickman" , "YongHyeon PYUN" Date: Fri, 8 Jul 2011 11:00:23 -0700 Thread-Topic: bce packet loss Thread-Index: Acw9EYeqMMNh5VAMSjOb2D/OBpbEcwAhUvgw Message-ID: <5D267A3F22FD854F8F48B3D2B523819385C32D96B7@IRVEXCHCCR01.corp.ad.broadcom.com> References: <20110706201509.GA5559@michelle.cdnetworks.com> <20110707174233.GB8702@michelle.cdnetworks.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US MIME-Version: 1.0 X-WSS-ID: 620999623B416804050-01-01 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Cc: "freebsd-net@freebsd.org" , David Christensen Subject: RE: bce packet loss X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2011 18:07:13 -0000 > I was able to reproduce the drops in very large numbers on the internal > network today. I simply scp'd some large files from 1000/FD hosts to a > 100/FD host (ie: scp bigfile.tgz oldhost.i:/dev/null). Immediately the > 1000/FD hosts sending the files showed massive amounts of drops on the > switch. This makes me suspect that this switch might be garbage in that > it doesn't have enough buffer space to handle sending large amounts of > traffic from the GigE ports to the FE ports without randomly dropping > packets. Granted, I don't really understand how a "good" switch does > this > either, I would have thought tcp just took care of throttling itself. If you have flow control enabled end-to-end I wouldn't expect to see such behavior, frames should not be dropped. If you're seeing drops at the switch then I'd suspect that the traffic source connected to that switch doesn't honor flow control. Check if either the switch or traffic source keeps statistics on flow control frames generated/received. > Bear in mind that on the external switch our port to our ISP, which is > the > destination of almost all the traffic, is 100/FD and not 1000/FD. >=20 > This of course does not explain why the original setup where I'd locked > the switch ports and the host ports to 100/FD showed the same behavior. >=20 > I'm stumped. >=20 > We are running 8.1, am I correct in that flow control is not implemented > there? We do have an 8.2-STABLE image from a month or so ago that we > are > testing with zfs v28, might that implement flow control? Flow control will depend on the NIC driver implementation. Older versions of the bce(4) firmware will rarely generate pause frames (frames would be dropped by firmware but statistics should show the frame drop occurring) and should always honor pause frames from the link partner when flow control is enabled. >=20 > Although reading this: >=20 > http://en.wikipedia.org/wiki/Ethernet_flow_control >=20 > It sounds like flow control is not terribly optimal since it forces the > host to block all traffic. Not sure if this means drops are eliminated, > reduced or shuffled around. When congestion is detected the switch should buffer up to a certain limit (say 80% of full) and then start sending pause frames to avoid dropping frames. This will affect all hosts connecting through the switch so congestion at one host can spread to other hosts (see http://www.ieee802.org/3/cm_study/public/september04/thaler_3_0904.pdf). Small networks with a few hosts should be OK with flow control but if you have dozens of switches and hundreds of hosts then it's not a good idea. Dave