Date: Thu, 3 Apr 2008 05:21:05 -0700 From: "Peter Wemm" <peter@wemm.org> To: "David Christensen" <davidch@broadcom.com> Cc: "cvs-src@freebsd.org" <cvs-src@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "cvs-all@freebsd.org" <cvs-all@freebsd.org> Subject: Re: cvs commit: src/sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h Message-ID: <e7db6d980804030521h7af1ecd4pc492caa0bce14138@mail.gmail.com> In-Reply-To: <e7db6d980803311234q1602d7a4ncf9e3fadaa5f679b@mail.gmail.com> References: <200802220046.m1M0kMPM008814@repoman.freebsd.org> <e7db6d980803310032p7b3eec0et355919afdb147218@mail.gmail.com> <5D267A3F22FD854F8F48B3D2B523819324EF633FCC@IRVEXCHCCR01.corp.ad.broadcom.com> <e7db6d980803311234q1602d7a4ncf9e3fadaa5f679b@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 31, 2008 at 12:34 PM, Peter Wemm <peter@wemm.org> wrote: > > On Mon, Mar 31, 2008 at 12:13 PM, David Christensen > <davidch@broadcom.com> wrote: > > > On Thu, Feb 21, 2008 at 5:46 PM, David Christensen > > > <davidch@freebsd.org> wrote: > > > > Modified files: > > > > sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h > > > > Log: > > > > MFC after: 4 weeks > > > > > > > > - Added loose RX MTU functionality to allow frames larger > > > than 1500 bytes > > > > to be accepted even though the interface MTU is set to 1500. > > > > - Implemented new TCP header splitting/jumbo frame > > > support which uses > > > > two chains for receive traffic rather than the original > > > single recevie > > > > chain. > > > > - Added additional debug support code. > > > > > > > > Revision Changes Path > > > > 1.36 +1559 -675 src/sys/dev/bce/if_bce.c > > > > 1.5 +6179 -4850 src/sys/dev/bce/if_bcefw.h > > > > 1.17 +264 -55 src/sys/dev/bce/if_bcereg.h > > > > > > This has been devastating on the freebsd.org cluster. > > > > > > Attached are three test runs. I've done a cold reboot, then 'cd > > > /usr/src/sys' and doing a 'cvs -Rq update' where the CVSROOT is over > > > nfs. > > > > > > First, the old driver: > > > svn# time cvs -Rq up > > > 0.890u 4.577s 1:14.48 7.3% 669+2315k 7379+0io 10094pf+0w > > > > > > Now, the same test again, but with this change included in the kernel: > > > svn# time cvs -Rq up > > > 0.940u 359.906s 7:01.04 85.7% 648+2242k 7365+0io 10082pf+0w > > > > > > Note the massive increase (nearly 100 times increase) in system time, > > > and the almost 7-fold increase in wall clock time. > > > > > > Turning on promisc mode helps a lot, but doesn't solve it. (This was > > > found when ps@ was using tcpdump to try and figure out what the > > > problem was) > > > > The change is needed to update the FreeBSD driver so that it can > > continue using production firmware for the controllers. The previous > > firmware was specific to FreeBSD and was not being maintained. > > > > I didn't see any performance issues running with netperf. Is the NFS > > traffic UDP or TCP? What's the MTU in use? How much system memory is > > available? > > NFS over UDP. We're also seeing problems with NIS/YP (also UDP) on > the box with the driver active. The MTU is the standard 1500. Both > machines have 8GB of ram. Both are 64 bit kernels. Client is a Dell > 2950 (2 x quad core2), the server is a HP DL385 (quad opteron with > bge). > > > > If this is a performance problem then the first place I would look is > > in the definitions for rx_bd_mbuf_alloc_size and pg_bd_mbuf_alloc_size. > > The older version of the driver would use multiple 2KB buffers > > (MCLBYTES in size) from a single chain when building a packet so you > > would typically have a single mbuf cluster passed to the stack. The > > new firmware uses two chains, each of which may be a different size. > > The current implementation will use MHLEN bytes for the rx chain and > > MCLBYTES for the pg chain. When a packet is received the hardware will > > place as much data as possible into a single mbuf in the rx chain, > > then place any remaining data into one or more mbufs in the pg chain. > > The driver will then stitch together the mbufs before passing them up > > the stack. This process is supposed to improve performance for TCP > > because the TCP payload will be split from the TCP header and should > > be quicker to access. > > > > A quick test would be to set rx_bd_mbuf_alloc_size to MCLBYTES, which > > should for the most part duplicate the older behavior. The driver > > will still allocate more mbufs which might be a problem if system > > memory is already low. Is anyone else aware of a driver that does > > TCP header splitting? It's typically on the TX side to see a packet > > with two or three mbufs in a chain but I suspect it's less typical > > on the RX side which could be part of the problem. > > The one thing that I'm very sure of is that system memory isn't low, > on either machine. The extraordinary increase in accumulated system > time of the process makes me wonder if something odd is going on with > the TX path. When sending packets, the network stack and driver code > path execution times are charged to the user process doing the writes. > On the receive side, the cpu time will be accumulated in either the > driver ithread or taskqueue, or the netisr kthread. To be honest, I > hadn't been looking to see if excessive cpu time was accumulating > there, but I did notice that the system's load average was over 2.0 > for the duration of the 'cvs update' on an otherwise idle machine. > This suggests to me that both send and receive were bogging down > somehow. > > Perhaps it is something silly like a spin lock being triggered? > > > > > > > > Here's the same test, with the new driver, and promisc mode on: > > > svn# ifconfig bce0 promisc > > > svn# time cvs -Rq up > > > 0.967u 50.919s 2:13.97 38.7% 650+2250k 7379+0io 10094pf+0w > > > > > > It is better.. Only double the wall clock time, but still over 10 > > > times as much system time. > > > > > > > It's not clear to me why promiscuous mode would make a difference > > here as that should only affect which packets are accepted by the > > MAC. Is there any teaming or VLANs used in your configuration? > > The RX MTU settings shouldn't be affected by promiscuous mode. > > There is nothing special going on. Just a plain gige cable to a cisco > gige switch. I have no explanation for the promisc thing - one of the > freebsd.org admins thought the problem was with YP/NIS. He started up > a tcpdump to observe the NIS interactions during ssh login, and the > problem mostly went away. > > BTW; I did the test twice. I ran the machine with cvs HEAD, and > backed the driver out to before the commit. I also tried a RELENG_7 > kernel, and then put the HEAD bce driver on 7.x - the problem goes > with the bce driver change in both 7.x and 8.x/HEAD. > > There will be 4 more of these machines online sometime today (7.x and > 8,x, both 32 and 64 bit). We can experiment with those at will. > > > > > > > > > > > > So please, don't MFC until this is solved.. > > > > > > > I haven't yet as I've received reports from a few other people that > > they're having problems, though they're functional problems and not > > performance issues. On 8.0/i386, with PAE enabled, I get messages on the console and the system hangs when trying to do a nfs mount. Backing out the driver fixes it. The same driver doesn't cause quite as spectacular a failure on 8.0/amd64, but it isn't exactly happy.. Additional IP options:.^M Mounting NFS file systebcms:e1: link state changed to UP^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M [..forever..] NFS over UDP, fwiw. Server is a netapp. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell **WANTED TO BUY: Garmin Streetpilot 2650 or 2660. Not later model! **
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e7db6d980804030521h7af1ecd4pc492caa0bce14138>