From owner-cvs-src@FreeBSD.ORG Thu Apr 3 12:21:06 2008 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C98231065670 for ; Thu, 3 Apr 2008 12:21:06 +0000 (UTC) (envelope-from peter@wemm.org) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.245]) by mx1.freebsd.org (Postfix) with ESMTP id 814068FC1B for ; Thu, 3 Apr 2008 12:21:06 +0000 (UTC) (envelope-from peter@wemm.org) Received: by an-out-0708.google.com with SMTP id c14so860607anc.13 for ; Thu, 03 Apr 2008 05:21:05 -0700 (PDT) Received: by 10.100.3.4 with SMTP id 4mr10200065anc.81.1207225265780; Thu, 03 Apr 2008 05:21:05 -0700 (PDT) Received: by 10.100.8.6 with HTTP; Thu, 3 Apr 2008 05:21:05 -0700 (PDT) Message-ID: Date: Thu, 3 Apr 2008 05:21:05 -0700 From: "Peter Wemm" To: "David Christensen" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200802220046.m1M0kMPM008814@repoman.freebsd.org> <5D267A3F22FD854F8F48B3D2B523819324EF633FCC@IRVEXCHCCR01.corp.ad.broadcom.com> Cc: "cvs-src@freebsd.org" , "src-committers@freebsd.org" , "cvs-all@freebsd.org" Subject: Re: cvs commit: src/sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 12:21:06 -0000 On Mon, Mar 31, 2008 at 12:34 PM, Peter Wemm wrote: > > On Mon, Mar 31, 2008 at 12:13 PM, David Christensen > wrote: > > > On Thu, Feb 21, 2008 at 5:46 PM, David Christensen > > > wrote: > > > > Modified files: > > > > sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h > > > > Log: > > > > MFC after: 4 weeks > > > > > > > > - Added loose RX MTU functionality to allow frames larger > > > than 1500 bytes > > > > to be accepted even though the interface MTU is set to 1500. > > > > - Implemented new TCP header splitting/jumbo frame > > > support which uses > > > > two chains for receive traffic rather than the original > > > single recevie > > > > chain. > > > > - Added additional debug support code. > > > > > > > > Revision Changes Path > > > > 1.36 +1559 -675 src/sys/dev/bce/if_bce.c > > > > 1.5 +6179 -4850 src/sys/dev/bce/if_bcefw.h > > > > 1.17 +264 -55 src/sys/dev/bce/if_bcereg.h > > > > > > This has been devastating on the freebsd.org cluster. > > > > > > Attached are three test runs. I've done a cold reboot, then 'cd > > > /usr/src/sys' and doing a 'cvs -Rq update' where the CVSROOT is over > > > nfs. > > > > > > First, the old driver: > > > svn# time cvs -Rq up > > > 0.890u 4.577s 1:14.48 7.3% 669+2315k 7379+0io 10094pf+0w > > > > > > Now, the same test again, but with this change included in the kernel: > > > svn# time cvs -Rq up > > > 0.940u 359.906s 7:01.04 85.7% 648+2242k 7365+0io 10082pf+0w > > > > > > Note the massive increase (nearly 100 times increase) in system time, > > > and the almost 7-fold increase in wall clock time. > > > > > > Turning on promisc mode helps a lot, but doesn't solve it. (This was > > > found when ps@ was using tcpdump to try and figure out what the > > > problem was) > > > > The change is needed to update the FreeBSD driver so that it can > > continue using production firmware for the controllers. The previous > > firmware was specific to FreeBSD and was not being maintained. > > > > I didn't see any performance issues running with netperf. Is the NFS > > traffic UDP or TCP? What's the MTU in use? How much system memory is > > available? > > NFS over UDP. We're also seeing problems with NIS/YP (also UDP) on > the box with the driver active. The MTU is the standard 1500. Both > machines have 8GB of ram. Both are 64 bit kernels. Client is a Dell > 2950 (2 x quad core2), the server is a HP DL385 (quad opteron with > bge). > > > > If this is a performance problem then the first place I would look is > > in the definitions for rx_bd_mbuf_alloc_size and pg_bd_mbuf_alloc_size. > > The older version of the driver would use multiple 2KB buffers > > (MCLBYTES in size) from a single chain when building a packet so you > > would typically have a single mbuf cluster passed to the stack. The > > new firmware uses two chains, each of which may be a different size. > > The current implementation will use MHLEN bytes for the rx chain and > > MCLBYTES for the pg chain. When a packet is received the hardware will > > place as much data as possible into a single mbuf in the rx chain, > > then place any remaining data into one or more mbufs in the pg chain. > > The driver will then stitch together the mbufs before passing them up > > the stack. This process is supposed to improve performance for TCP > > because the TCP payload will be split from the TCP header and should > > be quicker to access. > > > > A quick test would be to set rx_bd_mbuf_alloc_size to MCLBYTES, which > > should for the most part duplicate the older behavior. The driver > > will still allocate more mbufs which might be a problem if system > > memory is already low. Is anyone else aware of a driver that does > > TCP header splitting? It's typically on the TX side to see a packet > > with two or three mbufs in a chain but I suspect it's less typical > > on the RX side which could be part of the problem. > > The one thing that I'm very sure of is that system memory isn't low, > on either machine. The extraordinary increase in accumulated system > time of the process makes me wonder if something odd is going on with > the TX path. When sending packets, the network stack and driver code > path execution times are charged to the user process doing the writes. > On the receive side, the cpu time will be accumulated in either the > driver ithread or taskqueue, or the netisr kthread. To be honest, I > hadn't been looking to see if excessive cpu time was accumulating > there, but I did notice that the system's load average was over 2.0 > for the duration of the 'cvs update' on an otherwise idle machine. > This suggests to me that both send and receive were bogging down > somehow. > > Perhaps it is something silly like a spin lock being triggered? > > > > > > > > Here's the same test, with the new driver, and promisc mode on: > > > svn# ifconfig bce0 promisc > > > svn# time cvs -Rq up > > > 0.967u 50.919s 2:13.97 38.7% 650+2250k 7379+0io 10094pf+0w > > > > > > It is better.. Only double the wall clock time, but still over 10 > > > times as much system time. > > > > > > > It's not clear to me why promiscuous mode would make a difference > > here as that should only affect which packets are accepted by the > > MAC. Is there any teaming or VLANs used in your configuration? > > The RX MTU settings shouldn't be affected by promiscuous mode. > > There is nothing special going on. Just a plain gige cable to a cisco > gige switch. I have no explanation for the promisc thing - one of the > freebsd.org admins thought the problem was with YP/NIS. He started up > a tcpdump to observe the NIS interactions during ssh login, and the > problem mostly went away. > > BTW; I did the test twice. I ran the machine with cvs HEAD, and > backed the driver out to before the commit. I also tried a RELENG_7 > kernel, and then put the HEAD bce driver on 7.x - the problem goes > with the bce driver change in both 7.x and 8.x/HEAD. > > There will be 4 more of these machines online sometime today (7.x and > 8,x, both 32 and 64 bit). We can experiment with those at will. > > > > > > > > > > > > So please, don't MFC until this is solved.. > > > > > > > I haven't yet as I've received reports from a few other people that > > they're having problems, though they're functional problems and not > > performance issues. On 8.0/i386, with PAE enabled, I get messages on the console and the system hangs when trying to do a nfs mount. Backing out the driver fixes it. The same driver doesn't cause quite as spectacular a failure on 8.0/amd64, but it isn't exactly happy.. Additional IP options:.^M Mounting NFS file systebcms:e1: link state changed to UP^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M [..forever..] NFS over UDP, fwiw. Server is a netapp. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell **WANTED TO BUY: Garmin Streetpilot 2650 or 2660. Not later model! **