Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jun 2011 09:12:37 +1000
From:      Peter Jeremy <peterjeremy@acm.org>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 'make -j16 universe' gives SIReset
Message-ID:  <20110601231237.GA5267@server.vk2pj.dyndns.org>
In-Reply-To: <20110527120659.GA78000@alchemy.franken.de>
References:  <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--UugvWAfsgieZRqgk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2011-May-27 14:06:59 +0200, Marius Strobl <marius@alchemy.franken.de> wr=
ote:
>On Fri, May 27, 2011 at 09:47:28AM +1000, Peter Jeremy wrote:
>> I tried a "make -j16 universe" using a recent 8-stable on a 16-CPU
>> V890 and after about 11 minutes, I got the following.  This box

>It probably doesn't hurt to check the hardware with SunVTS though.

I ran a comprehensive SunVTS test for just under 24 hours without
any errors and then checked that an unmodified kernel still SIR'd
(which it did).

>ignores a certain amount of these). You could try whether the following
>patch makes any difference to the SIR you're seeing:
>http://people.freebsd.org/~marius/sparc64_intr_vector_stray.diff

Running -current, I didn't see any stray interrupts but I did get some
"spin lock held too long" panics.  Unfortunately, I don't seem to get
crashdumps - with DDB/KDB in the kernel, I get a hard hang - RSC
break doesn't work and the only way out is a RSC reset.

spin lock 0xc07914d8 (sched lock) held by 0xfffff8a29749d980 (tid 100443) t=
oo long
panic: spin lock held too long
cpuid =3D 14
KDB: stack backtrace:
panic() at panic+0x1c8
_mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x50
_mtx_lock_spin() at _mtx_lock_spin+0xac
sched_add() at sched_add+0x9c
sched_wakeup() at sched_wakeup+0x44
setrunnable() at setrunnable+0xb8
sleepq_resume_thread() at sleepq_resume_thread+0xfc
sleepq_broadcast() at sleepq_broadcast+0xbc
wakeup() at wakeup+0x18
exit1() at exit1+0x10b0
sys_exit() at sys_exit+0x10
syscallenter() at syscallenter+0x2f4
syscall() at syscall+0x30
-- syscall (1, FreeBSD ELF64, sys_exit) %o7=3D0x128ad0 --
userland() at 0x14b748
user trace: trap %o7=3D0x128ad0
pc 0x14b748, sp 0x7fdffffdb81


spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
panic: spin lock held too long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
cpuid =3D 3
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
KDB: stack backtrace:
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) t=
oo long
pspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
aspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
nspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
ispin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
cspin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
(spin lock 0xc07914d8 (sched lock) held by 0xfffff8a0168f3980 (tid 100312) =
too long
) at panic+0x1c8
_mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x50
_thread_lock_flags() at _thread_lock_flags+0xf8
statclock() at statclock+0x144
handleevents() at handleevents+0x120
timercb() at timercb+0x4c0
tick_intr() at tick_intr+0x1fc
-- interrupt level=3D0xe pil=3D0 %o7=3D0x5faa28 --
userland() at 0x5faa40
user trace: trap %o7=3D0x5faa28
pc 0x5faa40, sp 0x7fdffffb5f1
pc 0x5feab0, sp 0x7fdffffbf01
pc 0x600310, sp 0x7fdffffc011
pc 0x600628, sp 0x7fdffffc141
pc 0x6035f4, sp 0x7fdffffc251
pc 0x603b1c, sp 0x7fdffffc361
pc 0x605b14, sp 0x7fdffffc471
pc 0x5e6aa8, sp 0x7fdffffc551
pc 0x19c594, sp 0x7fdffffc611
pc 0x5a4aa0, sp 0x7fdffffc6d1
pc 0x454e08, sp 0x7fdffffc791
pc 0x4a65c0, sp 0x7fdffffc8c1
pc 0x4a0558, sp 0x7fdffffc981
pc 0x4a06c8, sp 0x7fdffffca41
pc 0x1924e0, sp 0x7fdffffcb01
pc 0x192684, sp 0x7fdffffcbc1
pc 0x1926a4, sp 0x7fdffffcc81
pc 0x1926a4, sp 0x7fdffffcd41
pc 0x54c710, sp 0x7fdffffce01
pc 0x144080, sp 0x7fdffffcec1
pc 0x17d698, sp 0x7fdffffcf81
pc 0x17e280, sp 0x7fdffffd041
pc 0x148710, sp 0x7fdffffd101
pc 0x1b9264, sp 0x7fdffffd1d1
pc 0x1001d0, sp 0x7fdffffd291
pc 0, sp 0x7fdffffd351
done
KDB: enter: panic

On -stable, I didn't see any stray interrupts but had the system hang
several times, needing a RSC reset to recover.

>and once with:
>	val &=3D ~DCR_SI;

I tried adding this and the system survived a "make -j30 universe" on
-stable (BTW "make universe" seems to have issues cross-building x86
derivatives).  I'm now trying that on -current.  I'm not sure what the
implications of the above change are.

--=20
Peter Jeremy

--UugvWAfsgieZRqgk
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)

iEYEARECAAYFAk3mx2UACgkQ/opHv/APuIcazgCfeS5I0HM5ODCLabzxtfkz42zC
uvQAn3rjYotdp/KYnUNY3cYdk1gEyc8g
=1v5M
-----END PGP SIGNATURE-----

--UugvWAfsgieZRqgk--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110601231237.GA5267>