From owner-freebsd-stable@FreeBSD.ORG Wed Jul 8 09:50:37 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ACB421065677 for ; Wed, 8 Jul 2009 09:50:37 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f218.google.com (mail-fx0-f218.google.com [209.85.220.218]) by mx1.freebsd.org (Postfix) with ESMTP id 31C928FC19 for ; Wed, 8 Jul 2009 09:50:36 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fxm18 with SMTP id 18so4652433fxm.43 for ; Wed, 08 Jul 2009 02:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=e3Mjw4iMLL0atCyHwrluoh6bI0wCWyhFZu8ENkRDkRA=; b=Y6jG2xwj590Kr4D61kHS2Vl2Z9S4U5nE+OR5F3tPdPJ5x4CGviD9YWN8747RYz9zyf 1VnO2rH//WjYRFfkVW61W3E0KOWyc8LMdYvx78FTIpXaJSGs/qYXkEARUWSk6/nx/kCB obMwHEIRxv+e6linLIh+4UYzGPQMWaczAQ4MM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=W5Zlm8S1BQk1ENBM7tpET+510no8brahxv3cJ3QXYa/asDfMLTzAsRbIBHpwjRKqAI A4V97vF11QaNtVluyn6kx6xIhzjs5DvtildXUYv8Fon1um0MwKalqSfVtXEtL2TClAc8 LHEfVVRdbPNJRVIKXcD/W8ad2vksK2MYFgbmE= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.105.7 with SMTP id r7mr3243872fao.8.1247046636101; Wed, 08 Jul 2009 02:50:36 -0700 (PDT) In-Reply-To: References: <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> Date: Wed, 8 Jul 2009 11:50:36 +0200 X-Google-Sender-Auth: 229ed4d9e280b82e Message-ID: <3bbf2fe10907080250q35899d3dhc2f101b62c6e5306@mail.gmail.com> From: Attilio Rao To: Dan Naumov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: FreeBSD-STABLE Mailing List Subject: Re: 7.2-release/amd64: panic, spin lock held too long X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2009 09:50:37 -0000 2009/7/8 Dan Naumov : > On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote: >> On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: >>> 2009/7/7 Dan Naumov : >>>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: >>>>> 2009/7/7 Dan Naumov : >>>>>> I just got a panic following by a reboot a few seconds after running >>>>>> "portsnap update", /var/log/messages shows the following: >>>>>> >>>>>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>>>>> Jul 7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock >>>>>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>>>>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >>>>> >>>>> That's a known bug, affecting -CURRENT as well. >>>>> The cpustop IPI is handled though an NMI, which means it could >>>>> interrupt a CPU in any moment, even while holding a spinlock, >>>>> violating one well known FreeBSD rule. >>>>> That means that the cpu can stop itself while the thread was holding >>>>> the sched lock spinlock and not releasing it (there is no way, modulo >>>>> highly hackish, to fix that). >>>>> In the while hardclock() wants to schedule something else to run and >>>>> got stuck on the thread lock. >>>>> >>>>> Ideal fix would involve not using a NMI for serving the cpustop while >>>>> having a cheap way (not making the common path too hard) to tell >>>>> hardclock() to avoid scheduling while cpustop is in flight. >>>>> >>>>> Thanks, >>>>> Attilio >>>> >>>> Any idea if a fix is being worked on and how unlucky must one be to >>>> run into this issue, should I expect it to happen again? Is it >>>> basically completely random? >>> >>> I'd like to work on that issue before BETA3 (and backport to >>> STABLE_7), I'm just time-constrained right now. >>> it is completely random. >>> >>> Thanks, >>> Attilio >> >> Ok, this is getting pretty bad, 23 hours later, I get the same kind of >> panic, the only difference is that instead of "portsnap update", this >> was triggered by "portsnap cron" which I have running between 3 and 4 >> am every day: >> >> Jul 8 03:03:49 atom kernel: ssppiinn lloocckk >> 00xxffffffffffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h >> ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d >> 10100006070)5 )t otoo ol olnogng >> Jul 8 03:03:49 atom kernel: p >> Jul 8 03:03:49 atom kernel: anic: spin lock held too long >> Jul 8 03:03:49 atom kernel: cpuid = 0 >> Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s > > I have now tried repeating the problem by running "stress --cpu 8 --io > 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed > system load into the 15.50 ballpark and simultaneously running > "portsnap fetch" and "portsnap update" but I couldn't manually trigger > the panic, it seems that this problem is indeed random (although it > baffles me why is it specifically portsnap triggering it). I have now > disabled powerd to check whether that makes any difference to system > stability. But is that happening at reboot time? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein