From owner-freebsd-net@FreeBSD.ORG Thu Nov 24 01:40:27 2005 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 41C1016A41F; Thu, 24 Nov 2005 01:40:27 +0000 (GMT) (envelope-from mv@roq.com) Received: from vault.mel.jumbuck.com (ppp166-27.static.internode.on.net [150.101.166.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 28BEB43D5E; Thu, 24 Nov 2005 01:40:25 +0000 (GMT) (envelope-from mv@roq.com) Received: from vault.mel.jumbuck.com (localhost [127.0.0.1]) by vault.mel.jumbuck.com (Postfix) with ESMTP id 796F98A065; Thu, 24 Nov 2005 12:40:17 +1100 (EST) Received: from [192.168.46.52] (unknown [192.168.46.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by vault.mel.jumbuck.com (Postfix) with ESMTP id 470878A023; Thu, 24 Nov 2005 12:40:17 +1100 (EST) Message-ID: <43851A08.5080802@roq.com> Date: Thu, 24 Nov 2005 12:40:24 +1100 From: Michael Vince User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.12) Gecko/20051110 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kris Kennaway References: <20051123030304.GA84202@xor.obsecurity.org> <20051123084653.GA90927@xor.obsecurity.org> In-Reply-To: <20051123084653.GA90927@xor.obsecurity.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Cc: net@freebsd.org, current@freebsd.org, John Polstra Subject: Re: em interrupt storm X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Nov 2005 01:40:27 -0000 Kris Kennaway wrote: >On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote: > > >>On 23-Nov-2005 Kris Kennaway wrote: >> >> >>>I am seeing the em driver undergoing an interrupt storm whenever the >>>amr driver receives interrupts. In this case I was running newfs on >>>the amr array and em0 was not in use: >>> >>> 28 root 1 -68 -187 0K 8K CPU1 1 0:32 53.98% irq16: em0 >>> 36 root 1 -64 -183 0K 8K RUN 1 0:37 27.75% irq24: amr0 >>> >>># vmstat -i >>>interrupt total rate >>>irq1: atkbd0 2 0 >>>irq4: sio0 199 1 >>>irq6: fdc0 32 0 >>>irq13: npx0 1 0 >>>irq14: ata0 47 0 >>>irq15: ata1 931 5 >>>irq16: em0 6321801 37187 >>>irq24: amr0 28023 164 >>>cpu0: timer 337533 1985 >>>cpu1: timer 337285 1984 >>>Total 7025854 41328 >>> >>>When newfs finished (i.e. amr was idle), em0 stopped storming. >>> >>>MPTable: >>> >>> >>This is the dreaded interrupt aliasing problem that several of us have >>experienced with this chipset. High-numbered interrupts alias down to >>interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less >>than the original interupt. >> >>Nobody knows what causes it, and nobody knows how to fix it. >> >> > >This would be good to document somewhere so that people don't either >accidentally buy this hardware, or know what to expect when they run >it. > >Kris > > This is Intels latest server chipset designs and Dell are putting that chipset in all their servers. Luckily I haven't not seen the problem on any of my Dell servers (as long as I am looking at this right). This server has been running for a long time. vmstat -i interrupt total rate irq1: atkbd0 6 0 irq4: sio0 23433 0 irq6: fdc0 10 0 irq8: rtc 2631238611 128 irq13: npx0 1 0 irq14: ata0 99 0 irq16: uhci0 1507608958 73 irq18: uhci2 42005524 2 irq19: uhci1 3 0 irq23: atapci0 151 0 irq46: amr0 41344088 2 irq64: em0 1513106157 73 irq0: clk 2055605782 99 Total 7790932823 379 This one just transfered over 8gigs of data in 77seconds with around 1000 simultaneous tcp connections under a load of 35. Both seem OK. vmstat -i interrupt total rate irq4: sio0 315 0 irq13: npx0 1 0 irq14: ata0 47 0 irq16: uhci0 2894669 2 irq18: uhci2 977413 0 irq23: ehci0 3 0 irq46: amr0 883138 0 irq64: em0 2890414 2 cpu0: timer 2763566717 1999 cpu3: timer 2763797300 1999 cpu1: timer 2763551479 1999 cpu2: timer 2763797870 1999 Total 11062359366 8004 Mike