From owner-freebsd-current@FreeBSD.ORG Wed Jul 29 02:43:38 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CABC21065786; Wed, 29 Jul 2009 02:43:38 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f223.google.com (mail-fx0-f223.google.com [209.85.220.223]) by mx1.freebsd.org (Postfix) with ESMTP id EC7058FC08; Wed, 29 Jul 2009 02:43:37 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fxm23 with SMTP id 23so412227fxm.43 for ; Tue, 28 Jul 2009 19:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=zcRJFaRKZ1HCFbf44mrUvSB4NFvGon5bkvMQ0PPpMDw=; b=aSGC1YMbkTDb9HkvrDvKktujozRJsRpYtntmwcX5NmIJAhIC4v6m2JYhxm/RO36lUb vmtC150LsCt3k532sUD7SFI44wWas6MeVhkfTGAyisukI+9jaNZD9CO0R6DJuZNkwAbg zxmmgleCrAodBWsMGlo5S7xe3LkjodvtiJ/TE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=R+f3zt1RWsM063IZ2bkTRanLMIRO/rjfAXhogLp71FzQraWLmSFE/ToGukXND1xior iniZyKgmyFZ9qCfS1mMxDUIB0EGtDKf+jyhz6191TEIea3vHFh+5uBsGaPavKWNMDZdr 79KuJFc84frUSWrES7sRUMEpfu7gcO83z5u3E= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.115.193 with SMTP id j1mr3702147faq.85.1248835416861; Tue, 28 Jul 2009 19:43:36 -0700 (PDT) In-Reply-To: <226F1AFF-45D8-4E4C-BE7F-D2EDC35EC8F6@lassitu.de> References: <746CE32B-BCF8-460A-982D-25341554E8FD@lassitu.de> <3bbf2fe10905221234k12c45932gb1e197143cd74b5d@mail.gmail.com> <20090522230333.X72053@maildrop.int.zabbadoz.net> <3bbf2fe10905221846q7fd1fe9cue744de61f9e12612@mail.gmail.com> <226F1AFF-45D8-4E4C-BE7F-D2EDC35EC8F6@lassitu.de> Date: Wed, 29 Jul 2009 04:43:36 +0200 X-Google-Sender-Auth: ab60a594a3d2715f Message-ID: <3bbf2fe10907281943m2392a9f9w7c69303e6c3b91d0@mail.gmail.com> From: Attilio Rao To: Stefan Bethke Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: FreeBSD Current , Giovanni Trematerra , Dan Naumov , barbara , "Bjoern A. Zeeb" , Robert Watson , "C. C. Tang" Subject: Re: spinlock held too long on reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jul 2009 02:43:39 -0000 2009/5/23 Stefan Bethke : > I wrote: > >> Syncing disks, vnodes remaining...0 done >> All buffers synced. >> GEOM_MIRROR: Device diesel_root: provider mirror/diesel_root destroyed. >> Uptime: 6m32s >> GEOM_MIRROR: Device diesel_root destroyed. >> Rebooting... >> cpu_reset: Stopping other CPUs >> spin lock 0xffffffff8078c900 (sched lock 1) held by 0xffffff00014d4ab0 >> (tid 100002) too long >> panic: spin lock held too long >> cpuid = 0 >> KDB: enter: panic >> [thread pid 77 tid 100090 ] >> Stopped at kdb_enter+0x3d: movq $0,0x48bbd0(%rip) >> db> bt >> Tracing pid 77 tid 100090 td 0xffffff000457bab0 >> kdb_enter() at kdb_enter+0x3d >> panic() at panic+0x17b >> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 >> _mtx_lock_spin() at _mtx_lock_spin+0x9e >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x72 >> sched_balance_group() at sched_balance_group+0xc5 >> sched_balance_group() at sched_balance_group+0x1f8 >> sched_balance() at sched_balance+0xa2 >> sched_clock() at sched_clock+0xf6 >> statclock() at statclock+0xbd >> lapic_handle_timer() at lapic_handle_timer+0x197 >> Xtimerint() at Xtimerint+0x8c >> --- interrupt, rip = 0xffffffff80541cc4, rsp = 0xffffff80771dba90, rbp = >> 0xffffff80771dbab0 --- >> DELAY() at DELAY+0x64 >> cpu_reset() at cpu_reset+0xdd >> boot() at boot+0x2e6 >> reboot() at reboot+0x42 >> syscall() at syscall+0x1a5 >> Xfast_syscall() at Xfast_syscall+0xd0 >> --- syscall (55, FreeBSD ELF64, reboot), rip = 0x800788eec, rsp = >> 0x7fffffffeca8, rbp = 0 --- > > > I've only seen this once. If I should encounter it again, is there > something you'd like me to look at? [ Sorry, trying to add anyone who alredy reported such a problem even if I know many of you experienced it on -STABLE] Could you try this patch against -CURRENT: http://www.freebsd.org/~attilio/stop_nmi.diff This patch basically does 2 things: 1) Removing the STOP_NMI option, and adding the infrastructure for using NMI on KDB invocation and normal stop IPIs on standard cpu shutdown. In order to accomplish that and forsee a better design than what STOP_NMI does now, 2 new functions are introduced: * ipi_hstop_selected() which does, if the architecture offers such an option, the possibility to send a "forced" IPI through a privileged channel (NMI on amd64 and ia32) in order to stop CPUs passed in the mask. Note that for the other architectures that are not amd64 and ia32 ipi_hstop_selected() is defaulted to ipi_selected(..., STOP_IPI), but if maintainers want to override that they can simply implement something harder * stop_cpus_hard() which is a 'more powerful' version of stop_cpus() that uses ipi_hstop_selected() instead than ipi_selected(..., STOP_IPI) in order to stop cpus In the end, while shutdown subsystem keeps using stop_cpus(), kdb now uses stop_cpus_hard(). 2) Disable interrupts on CPU0 while doing the stop_cpus() for others. That does avoid spourious fast handlers to preempt the CPU0 while doing the stopping (aka: timerint running hardclock()) If you can report if that patch fixes the problem for you it would be great. I'm alredy well aware that this patch needs an entry in UPDATING too if we verify it does solve the problem. If someone wants to port this to STABLE_7 and he is faster than me, he is welcome. Due to invasivness of the patch, it should be modified if eventually to be ported on STABLE_7. I tested it on i386, but I would eventually need of run a make universe. I will do ASAP. * Please don't forget to drop STOP_NMI by your own custom config files * Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein