From owner-freebsd-net@FreeBSD.ORG Mon Sep 13 18:49:26 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EED31065675; Mon, 13 Sep 2010 18:49:26 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5657A8FC1F; Mon, 13 Sep 2010 18:49:26 +0000 (UTC) Received: by pwi8 with SMTP id 8so2656720pwi.13 for ; Mon, 13 Sep 2010 11:49:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=AY7uykmWQIEehY4B60eKZodpuyNGP+AM1z8eA6Vvlt8=; b=vC0OKem09J8U5dxARxljPN1Nc2s6t4vIBwr7zzwF/ZYSZxMuW1QpQP9ETaV96uqc65 mSv5/SywkuJ7wgV73NGSrS39qZDPgAws8v+0IK40kCqUO6gBxvzpMmpxvmLzbPZZToAF FK/EdXEnaeJcTs+eK+n5ZkvUKa9TF5EdgPIm0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=MA8h/zNKPsm/dm3qjRtPfX3N2RYca7XBRTBwxmanRskt8J0I9uLHfDUNvySgd2yhnK 8Pc3Bq8jYN/gZcfHxf0eS1dBNEhM4Mohyiq+Qp6MgGI3mjDYsz+nkJoI6j7/RpfEfbUc U3U2NW8VhPXa5oBolDdHIF1RQc2adSE8Ze7Z0= Received: by 10.114.121.18 with SMTP id t18mr164262wac.136.1284403764860; Mon, 13 Sep 2010 11:49:24 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id d38sm12546387wam.8.2010.09.13.11.49.22 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 13 Sep 2010 11:49:23 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 13 Sep 2010 11:48:33 -0700 From: Pyun YongHyeon Date: Mon, 13 Sep 2010 11:48:33 -0700 To: Tom Judge Message-ID: <20100913184833.GF1229@michelle.cdnetworks.com> References: <4C894A76.5040200@tomjudge.com> <20100910002439.GO7203@michelle.cdnetworks.com> <4C8E3D79.6090102@tomjudge.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C8E3D79.6090102@tomjudge.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org, davidch@broadcom.com, yongari@freebsd.org Subject: Re: bce(4) - com_no_buffers (Again) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 18:49:26 -0000 On Mon, Sep 13, 2010 at 10:04:25AM -0500, Tom Judge wrote: > On 09/09/2010 07:24 PM, Pyun YongHyeon wrote: > > On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote: > > > >> Hi, > >> I am just following up on the thread from March (I think) about this issue. > >> > >> We are seeing this issue on a number of systems running 7.1. > >> > >> The systems in question are all Dell: > >> > >> * R710 R610 R410 > >> * PE2950 > >> > >> The latter do not show the issue as much as the R series systems. > >> > >> The cards in one of the R610's that I am testing with are: > >> > >> bce0@pci0:1:0:0: class=0x020000 card=0x02361028 chip=0x163914e4 > >> rev=0x20 hdr=0x00 > >> vendor = 'Broadcom Corporation' > >> device = 'NetXtreme II BCM5709 Gigabit Ethernet' > >> class = network > >> subclass = ethernet > >> > >> They are connected to Dell PowerConnect 5424 switches. > >> > >> uname -a: > >> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4 > >> #3: Wed Sep 8 08:19:03 UTC 2010 > >> tj@dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10 amd64 > >> > >> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the > >> configuration (the nics are in promisc as we are currently capturing > >> netflow data on another vlan for diagnostic purposes. ): > >> > >> > >> > > >> I have updated the bce driver and the Broadcomm MII driver to the > >> version from stable/7 and am still seeing the issue. > >> > >> This morning I did a test with increasing the RX_PAGES to 8 but the > >> system just hung starting the network. The route command got stuck in a > >> zone state (Sorry can't remember exactly which). > >> > >> The real question is, how do we go about increasing the number of RX > >> BDs? I guess we have to bump more that just RX_PAGES... > >> > >> > >> The cause for us, from what we can see, is the openldap server sending > >> large group search results back to nss_ldap or pam_ldap. When it does > >> this it seems to send each of the 600 results in its own TCP segment > >> creating a small packet storm (600*~100byte PDU's) at the destination > >> host. The kernel then retransmits 2 blocks of 100 results each after > >> SACK kicks in for the data that was dropped by the NIC. > >> > >> > >> Thanks in advance > >> > >> Tom > >> > >> > >> > > > FW may drop incoming frames when it does not see available RX > > buffers. Increasing number of RX buffers slightly reduce the > > possibility of dropping frames but it wouldn't completely fix it. > > Alternatively driver may tell available RX buffers in the middle > > of RX ring processing instead of giving updated buffers at the end > > of RX processing. This way FW may see available RX buffers while > > driver/upper stack is busy to process received frames. But this may > > introduce coherency issues because the RX ring is shared between > > host and FW. If FreeBSD has way to sync partial region of a DMA > > map, this could be implemented without fear of coherency issue. > > Another way to improve RX performance would be switching to > > multi-RX queue with RSS but that would require a lot of work and I > > had no time to implement it. > > > > Does this mean that these cards are going to perform badly? This is was > what I gathered from the previous thread. > I mean there are still many rooms to be done in driver for better performance. bce(4) controllers are one of best controllers for servers and driver didn't take full advantage of it. > > BTW, given that you've updated to bce(4)/mii(4) of stable/7, I > > wonder why TX/RX flow controls were not kicked in. > > > > The working copy I used for grabbing the upstream source is at r212371. > > Last changes for the directories in my working copy: > > sys/dev/bce @ 211388 > sys/dev/mii @ 212020 > > > I discovered that flow control was disabled on the switches, so I set it > to auto and added a pair of BCE_PRINTF's in the code where it enables > and disables flow control and now it gets enabled. > Ok. > > Without BCE_JUMBO_HDRSPLIT then we see no errors. With it we see number > of errors, however the rate seems to be reduced compaired to the > previous version of the driver. > It seems there are issues in header splitting and it was disabled by default. Header splitting reduces packet processing overhead in upper layer so it's normal to see better performance with header splitting.