From owner-freebsd-stable@FreeBSD.ORG Thu Jul 16 07:06:01 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3F05106564A for ; Thu, 16 Jul 2009 07:06:01 +0000 (UTC) (envelope-from hiyorin@gmail.com) Received: from mail-px0-f200.google.com (mail-px0-f200.google.com [209.85.216.200]) by mx1.freebsd.org (Postfix) with ESMTP id 8F48C8FC17 for ; Thu, 16 Jul 2009 07:06:01 +0000 (UTC) (envelope-from hiyorin@gmail.com) Received: by pxi38 with SMTP id 38so1455498pxi.3 for ; Thu, 16 Jul 2009 00:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=kvTJEMLgRM94Y6DdSpgL5ZB5bJMz+PwMK6KK2ibzkFc=; b=wOwtURqC/+SiT1t7Z7D7ASAnTQ9/gTgLDmZtJVaWukaV+KtNwG96tDlV9I5VVTU2KM DxtVUcJY4XSLDQr8wOVJqS/8M4om5OKJfKGlK5s6yAvpNZW8NumwKiPk80VBHF3Rmx3I 8Ef4IQpuOcecr0F0earNI/DDXBiUMRdaywAbg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=Qufmk1xPPNoVIkZE76A8tmyU2yihKq/WyikF9nHJnGGyRONkCKd2O47HVl6KFy5BRF Cf8JlWfGdKjP/FzpH3ADMg/TABVwuc99ltXHzVQ+KmkHiVe812/5IZiNAeg9oWF3mk9u gsEhN40LUMxS78OT2E/iflpwFkCC8u/idRS6k= Received: by 10.141.43.19 with SMTP id v19mr4980531rvj.22.1247726740133; Wed, 15 Jul 2009 23:45:40 -0700 (PDT) Received: from ?10.130.10.181? ([202.82.159.125]) by mx.google.com with ESMTPS id g31sm11604665rvb.10.2009.07.15.23.45.38 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 15 Jul 2009 23:45:39 -0700 (PDT) Message-ID: <4A5ECC8C.7030808@gmail.com> Date: Thu, 16 Jul 2009 14:45:32 +0800 From: "C. C. Tang" User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> <3bbf2fe10907080250q35899d3dhc2f101b62c6e5306@mail.gmail.com> In-Reply-To: <3bbf2fe10907080250q35899d3dhc2f101b62c6e5306@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: 7.2-release/amd64: panic, spin lock held too long X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jul 2009 07:06:02 -0000 Attilio Rao wrote: > 2009/7/8 Dan Naumov : >> On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote: >>> On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: >>>> 2009/7/7 Dan Naumov : >>>>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: >>>>>> 2009/7/7 Dan Naumov : >>>>>>> I just got a panic following by a reboot a few seconds after running >>>>>>> "portsnap update", /var/log/messages shows the following: >>>>>>> >>>>>>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>>>>>> Jul 7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock >>>>>>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>>>>>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >>>>>> That's a known bug, affecting -CURRENT as well. >>>>>> The cpustop IPI is handled though an NMI, which means it could >>>>>> interrupt a CPU in any moment, even while holding a spinlock, >>>>>> violating one well known FreeBSD rule. >>>>>> That means that the cpu can stop itself while the thread was holding >>>>>> the sched lock spinlock and not releasing it (there is no way, modulo >>>>>> highly hackish, to fix that). >>>>>> In the while hardclock() wants to schedule something else to run and >>>>>> got stuck on the thread lock. >>>>>> >>>>>> Ideal fix would involve not using a NMI for serving the cpustop while >>>>>> having a cheap way (not making the common path too hard) to tell >>>>>> hardclock() to avoid scheduling while cpustop is in flight. >>>>>> >>>>>> Thanks, >>>>>> Attilio >>>>> Any idea if a fix is being worked on and how unlucky must one be to >>>>> run into this issue, should I expect it to happen again? Is it >>>>> basically completely random? >>>> I'd like to work on that issue before BETA3 (and backport to >>>> STABLE_7), I'm just time-constrained right now. >>>> it is completely random. >>>> >>>> Thanks, >>>> Attilio >>> Ok, this is getting pretty bad, 23 hours later, I get the same kind of >>> panic, the only difference is that instead of "portsnap update", this >>> was triggered by "portsnap cron" which I have running between 3 and 4 >>> am every day: >>> >>> Jul 8 03:03:49 atom kernel: ssppiinn lloocckk >>> 00xxffffffffffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h >>> ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d >>> 10100006070)5 )t otoo ol olnogng >>> Jul 8 03:03:49 atom kernel: p >>> Jul 8 03:03:49 atom kernel: anic: spin lock held too long >>> Jul 8 03:03:49 atom kernel: cpuid = 0 >>> Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s >> I have now tried repeating the problem by running "stress --cpu 8 --io >> 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed >> system load into the 15.50 ballpark and simultaneously running >> "portsnap fetch" and "portsnap update" but I couldn't manually trigger >> the panic, it seems that this problem is indeed random (although it >> baffles me why is it specifically portsnap triggering it). I have now >> disabled powerd to check whether that makes any difference to system >> stability. > > But is that happening at reboot time? > > Thanks, > Attilio > I think I am also having similar problem on my Atom machine. (FreeBSD-7.2-Release-p1) It does not happen at boot/reboot but panic randomly. And I found that it remains stable for more than a month now after I disabled powerd... (although I want to have it enabled) -- C.C.