Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Nov 2010 22:47:15 +0100
From:      Giovanni Trematerra <giovanni.trematerra@gmail.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        Alexander Motin <mav@freebsd.org>, David Rhodus <sdrhodus@gmail.com>, freebsd-current@freebsd.org
Subject:   Re: panic: sched_priority: invalid priority 2906: nice 0, ticks 122865664 ftick 516947 ltick 517947 tick pri 2726
Message-ID:  <AANLkTikn0rkE4CncgD%2BSwg4GDYMAXhuKWDehR_sBAxnH@mail.gmail.com>
In-Reply-To: <AANLkTimAKS_PcnLb_8=zJq-mNd7B=wwoOYu_6LGYg3bk@mail.gmail.com>
References:  <AANLkTimy-2oSgy8E2D-=WO41%2BdSem8MY=ZNCSSH3bBt%2B@mail.gmail.com> <201011291007.37044.jhb@freebsd.org> <4CF3E68C.4050300@FreeBSD.org> <AANLkTimAKS_PcnLb_8=zJq-mNd7B=wwoOYu_6LGYg3bk@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 29, 2010 at 9:56 PM, Attilio Rao <attilio@freebsd.org> wrote:
> 2010/11/29 Alexander Motin <mav@freebsd.org>:
>> On 29.11.2010 17:07, John Baldwin wrote:
>>>
>>> On Friday, November 26, 2010 4:38:49 pm David Rhodus wrote:
>>>>
>>>> I hit this panic on my NFS server.
>>>>
>>>> -DR
>>>>
>>>> coke.fun dumped core - see /var/crash/vmcore.2
>>>>
>>>> Fri Nov 26 14:50:48 UTC 2010
>>>>
>>>> FreeBSD coke.fun 9.0-CURRENT FreeBSD 9.0-CURRENT #14 r215800: Wed Nov
>>>> 24 12:35:30 UTC 2010 =A0 =A0 root@coke.fun:/usr/obj/usr/src/sys/GENERI=
C
>>>> i386
>>>>
>>>> panic: sched_priority: invalid priority 2906: nice 0, ticks 122865664
>>>> ftick 516947 ltick 517947 tick pri 2726
>>>
>>> I ran the numbers and assuming a hz of 1000, this requires you to have =
a
>>> very
>>> large value for ts_ticks (about (2726 * 24)<< =A010). =A0I suspect this=
 is due
>>> to
>>> sched_tick() being invoked for a long idle sleep combined with the
>>> eventtimer
>>> changes. =A0Can you go to frame 10 and 'p td->td_sched->ts_ticks'?
>>
>> As I can see, this is VirtualBox virtual machine. So it is still a quest=
ion
>> what side makes so large hole in sched_tick() on some CPUs. It could be
>> interesting to get ktr(4) dump with KTR_SPARE2 mask:
>>
>> options =A0 =A0 =A0 =A0 KTR
>> options =A0 =A0 =A0 =A0 ALQ
>> options =A0 =A0 =A0 =A0 KTR_ALQ
>> options =A0 =A0 =A0 =A0 KTR_ENTRIES=3D131072
>> options =A0 =A0 =A0 =A0 KTR_COMPILE=3D(KTR_SPARE2)
>> options =A0 =A0 =A0 =A0 KTR_MASK=3D(KTR_SPARE2)
>
> I'm sure gianni (CC'ed) got =A0this bug
> and got some conclusions on it
> before (maybe he also has a patch).

I got it on QEMU and assumed that QEMU was not doing a proper job of
distributing run-time amongst cores (so VirtualBox???).
I figured out that sched_tick is being passed a huge number of ticks elapse=
d
for the cpu at startup, in my case, by hardclock_anycpu (kern_clock.c).

I haven't a patch only a dirty hack just to make sure we won't be
running for more than 5s solid, if we have a huge number of ticks in
input to sched_tick, which is something that ULE can still handle.

Hope this helps.

diff -r d16464301129 sys/kern/kern_clock.c
--- a/sys/kern/kern_clock.c     Thu Sep 23 11:56:35 2010 -0400
+++ b/sys/kern/kern_clock.c     Sun Oct 03 17:53:39 2010 -0400
@@ -525,7 +525,7 @@ hardclock_anycpu(int cnt, int usermode)
              PROC_SUNLOCK(p);
      }
      thread_lock(td);
-       sched_tick(cnt);
+       sched_tick((cnt < (hz*10)/2) ? cnt : (hz*10)/2);
      td->td_flags |=3D flags;
      thread_unlock(td);

--
Giovanni Trematerra



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikn0rkE4CncgD%2BSwg4GDYMAXhuKWDehR_sBAxnH>