From owner-freebsd-bugs@FreeBSD.ORG Wed Mar 30 22:42:39 2005 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A158716A4CE for ; Wed, 30 Mar 2005 22:42:39 +0000 (GMT) Received: from mail24.sea5.speakeasy.net (mail24.sea5.speakeasy.net [69.17.117.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 45FE043D1F for ; Wed, 30 Mar 2005 22:42:39 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 11726 invoked from network); 30 Mar 2005 22:42:39 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender )AES256-SHA encrypted SMTP for ; 30 Mar 2005 22:42:37 -0000 Received: from [10.50.41.231] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j2UMgRJk019445; Wed, 30 Mar 2005 17:42:28 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Bruce Evans Date: Wed, 30 Mar 2005 15:52:02 -0500 User-Agent: KMail/1.6.2 References: <815955888.20050323113529@osk.com.ua> <1101884216.20050323181742@osk.com.ua> <20050330155502.E16886@delplex.bde.org> In-Reply-To: <20050330155502.E16886@delplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Message-Id: <200503301552.02472.jhb@FreeBSD.org> X-Spam-Status: No, score=-102.8 required=4.2 tests=ALL_TRUSTED, USER_IN_WHITELIST autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx cc: freebsd-bugs@FreeBSD.org cc: Oleg Tarasov Subject: Re: sio interrupt-level buffer overflows X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2005 22:42:39 -0000 On Wednesday 30 March 2005 01:06 am, Bruce Evans wrote: > On Wed, 23 Mar 2005, Oleg Tarasov wrote: > > About my panics. They persist and when this server panics it somehow > > overloads my network so it stops functioning until reboot. This is > > very, very bad. > > > > Maybe you could tell me where to write, or you could > > personally tell me what should I do. > > > > Using all my theoretical skills I have come to this data I could > > obtain from my dump: > > > > (kgdb) backtrace > > #0 doadump () at pcpu.h:159 > > #1 0xc060b063 in boot (howto=260) at > > /usr/src/sys/kern/kern_shutdown.c:397 #2 0xc060b389 in panic > > (fmt=0xc080321d "spin lock held too long") at > > /usr/src/sys/kern/kern_shutdown.c:553 > > #3 0xc060270c in _mtx_lock_spin (m=0xc08d7800, td=0xc19ca320, opts=0, > > file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:613 > > #4 0xc077c165 in siointr (arg=0xc1ab8800) at > > /usr/src/sys/dev/sio/sio.c:1710 #5 0xc0790ead in intr_execute_handlers > > (isrc=0xc19b8890, iframe=0xd541ac94) at > > /usr/src/sys/i386/i386/intr_machdep.c:203 > > #6 0xc07932be in lapic_handle_intr (frame= > > {if_vec = 52, if_fs = -717160424, if_es = -1067384816, if_ds = 16, > > if_edi = -1046699232, if_esi = -1064591424, if_ebp = -717116188, if_ebx = > > -1046425600, if_edx = -1064566184, if_ecx = 0, if_eax = -1046425600, > > if_eip = -1067440569, if _cs = 8, if_eflags = 582, if_esp = -1045200000, > > if_ss = 4}) > > at /usr/src/sys/i386/i386/local_apic.c:490 > > #7 0xc078d753 in Xapic_isr1 () at apic_vector.s:110 > > #8 0x00000034 in ?? () > > #9 0xd5410018 in ?? () > > #10 0xc0610010 in coredump (td=0xc08b9fc0) at vnode_if.h:1244 > > #11 0xc05f6f46 in ithread_loop (arg=0xc1981c80) > > at /usr/src/sys/kern/kern_intr.c:546 > > #12 0xc05f6001 in fork_exit (callout=0xc05f6df8 , > > arg=0xc1981c80, frame=0xd541ad48) at /usr/src/sys/kern/kern_fork.c:811 > > #13 0xc078d3fc in fork_trampoline () at > > /usr/src/sys/i386/i386/exception.s:209 ... > > I couldn't figure out the problem from this. Your later mail says that > the problem is caused by ppp not being MPSAFE, at least with sio, so I > won't do much more with this stack trace, but I wonder about some of the > strange entries in it: > > #13 - #11 are normal. > #10 is weird. ithread_loop() shouldn't call coredump(). > #8 - #9 seem to be more like stack garbage than module addresses. > #7 is normal, but it looks like someone broke stack traces for interrupts, > giving the garbage in #8 - #10. This is weird as we do match on Xapic_isr as being an interrupt frame. I'm not sure why that didn't work correctly. > #0 - #6 are normal if the spin lock is already held by the same CPU that > is handling the interrupt (except this can't happen :-). I wouldn't > have thought that broken locking in ppp could cause this. It's also normal if another CPU is holding the lock and spins with it for some reason. > > Bruce -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org