Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Jul 2009 11:50:36 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Dan Naumov <dan.naumov@gmail.com>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: 7.2-release/amd64: panic, spin lock held too long
Message-ID:  <3bbf2fe10907080250q35899d3dhc2f101b62c6e5306@mail.gmail.com>
In-Reply-To: <cf9b1ee00907080018s3f32c8afr4f65f01ce9ff1f25@mail.gmail.com>
References:  <cf9b1ee00907061812r3da70018i1c8d8d12bb038a80@mail.gmail.com> <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <cf9b1ee00907061825r34165c48x6727c50b3219d5fb@mail.gmail.com> <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> <cf9b1ee00907071757i169d2a82la260798f364054f9@mail.gmail.com> <cf9b1ee00907080018s3f32c8afr4f65f01ce9ff1f25@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2009/7/8 Dan Naumov <dan.naumov@gmail.com>:
> On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov<dan.naumov@gmail.com> wrote:
>> On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao<attilio@freebsd.org> wrote:
>>> 2009/7/7 Dan Naumov <dan.naumov@gmail.com>:
>>>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<attilio@freebsd.org> wrote:
>>>>> 2009/7/7 Dan Naumov <dan.naumov@gmail.com>:
>>>>>> I just got a panic following by a reboot a few seconds after running
>>>>>> "portsnap update", /var/log/messages shows the following:
>>>>>>
>>>>>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>>>>>> Jul  7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock
>>>>>> 1) held by 0xffffff00017d8370 (tid 100054) too long
>>>>>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>>>>>
>>>>> That's a known bug, affecting -CURRENT as well.
>>>>> The cpustop IPI is handled though an NMI, which means it could
>>>>> interrupt a CPU in any moment, even while holding a spinlock,
>>>>> violating one well known FreeBSD rule.
>>>>> That means that the cpu can stop itself while the thread was holding
>>>>> the sched lock spinlock and not releasing it (there is no way, modulo
>>>>> highly hackish, to fix that).
>>>>> In the while hardclock() wants to schedule something else to run and
>>>>> got stuck on the thread lock.
>>>>>
>>>>> Ideal fix would involve not using a NMI for serving the cpustop while
>>>>> having a cheap way (not making the common path too hard) to tell
>>>>> hardclock() to avoid scheduling while cpustop is in flight.
>>>>>
>>>>> Thanks,
>>>>> Attilio
>>>>
>>>> Any idea if a fix is being worked on and how unlucky must one be to
>>>> run into this issue, should I expect it to happen again? Is it
>>>> basically completely random?
>>>
>>> I'd like to work on that issue before BETA3 (and backport to
>>> STABLE_7), I'm just time-constrained right now.
>>> it is completely random.
>>>
>>> Thanks,
>>> Attilio
>>
>> Ok, this is getting pretty bad, 23 hours later, I get the same kind of
>> panic, the only difference is that instead of "portsnap update", this
>> was triggered by "portsnap cron" which I have running between 3 and 4
>> am every day:
>>
>> Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
>> 00xxffffffffffffffff8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
>> ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d
>> 10100006070)5 )t otoo ol olnogng
>> Jul  8 03:03:49 atom kernel: p
>> Jul  8 03:03:49 atom kernel: anic: spin lock held too long
>> Jul  8 03:03:49 atom kernel: cpuid = 0
>> Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s
>
> I have now tried repeating the problem by running "stress --cpu 8 --io
> 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed
> system load into the 15.50 ballpark and simultaneously running
> "portsnap fetch" and "portsnap update" but I couldn't manually trigger
> the panic, it seems that this problem is indeed random (although it
> baffles me why is it specifically portsnap triggering it). I have now
> disabled powerd to check whether that makes any difference to system
> stability.

But is that happening at reboot time?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10907080250q35899d3dhc2f101b62c6e5306>