Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jul 2011 11:20:27 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: panic: spin lock held too long (RELENG_8 from today)
Message-ID:  <20110707082027.GX48734@deviant.kiev.zoral.com.ua>
In-Reply-To: <4E15620A.9030608@FreeBSD.org>
References:  <4E154A63.90600@sentex.net> <4E15620A.9030608@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--g0LfaOww9mCf9g0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 07, 2011 at 10:36:42AM +0300, Andriy Gapon wrote:
> on 07/07/2011 08:55 Mike Tancsa said the following:
> > I did a buildworld on this box to bring it up to RELENG_8 for the BIND
> > fixes.  Unfortunately, the formerly solid box (April 13th kernel)
> > panic'd tonight with
> >=20
> > Unread portion of the kernel message buffer:
> > spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too=
 long
> > panic: spin lock held too long
> > cpuid =3D 0
> > Uptime: 13h30m4s
> > Physical memory: 2035 MB
> >=20
> >=20
> > Its a somewhat busy box taking in mail as well as backups for a few
> > servers over nfs.  At the time, it would have been getting about 250Mb/s
> > inbound on its gigabit interface.  Full core.txt file at
> >=20
> > http://www.tancsa.com/core-jul8-2011.txt
>=20
> I thought that this was supposed to contain output of 'thread apply all b=
t' in
> kgdb.  Anyway, I think that stacktrace for tid 100107 may have some useful
> information.
>=20
> > #0  doadump () at pcpu.h:231
> > 231     pcpu.h: No such file or directory.
> >         in pcpu.h
> > (kgdb) #0  doadump () at pcpu.h:231
> > #1  0xc06fd6d3 in boot (howto=3D260) at /usr/src/sys/kern/kern_shutdown=
.c:429
> > #2  0xc06fd937 in panic (fmt=3DVariable "fmt" is not available.
> > ) at /usr/src/sys/kern/kern_shutdown.c:602
> > #3  0xc06ed95f in _mtx_lock_spin_failed (m=3D0x0)
> >     at /usr/src/sys/kern/kern_mutex.c:490
> > #4  0xc06ed9e5 in _mtx_lock_spin (m=3D0xc0b1d200, tid=3D3312388992, opt=
s=3D0,
> >     file=3D0x0, line=3D0) at /usr/src/sys/kern/kern_mutex.c:526
> > #5  0xc0720254 in sched_add (td=3D0xc5dac5c0, flags=3D0)
> >     at /usr/src/sys/kern/sched_ule.c:1119
> > #6  0xc07203f9 in sched_wakeup (td=3D0xc5dac5c0)
> >     at /usr/src/sys/kern/sched_ule.c:1950
> > #7  0xc07061f8 in setrunnable (td=3D0xc5dac5c0)
> >     at /usr/src/sys/kern/kern_synch.c:499
> > #8  0xc07362af in sleepq_resume_thread (sq=3D0xca0da300, td=3D0xc5dac5c=
0,
> > pri=3DVariable "pri" is not available.
> > )
> >     at /usr/src/sys/kern/subr_sleepqueue.c:751
> > #9  0xc0736e18 in sleepq_signal (wchan=3D0xc5fafe50, flags=3D1, pri=3D0=
, queue=3D0)
> >     at /usr/src/sys/kern/subr_sleepqueue.c:825
> > #10 0xc06b6764 in cv_signal (cvp=3D0xc5fafe50)
> >     at /usr/src/sys/kern/kern_condvar.c:422
> > #11 0xc08eaa0d in xprt_assignthread (xprt=3DVariable "xprt" is not avai=
lable.
> > ) at /usr/src/sys/rpc/svc.c:342
> > #12 0xc08ec502 in xprt_active (xprt=3D0xc95d9600) at
> > /usr/src/sys/rpc/svc.c:378
> > #13 0xc08ee051 in svc_vc_soupcall (so=3D0xc6372ce0, arg=3D0xc95d9600,
> > waitflag=3D1)
> >     at /usr/src/sys/rpc/svc_vc.c:747
> > #14 0xc075bbb1 in sowakeup (so=3D0xc6372ce0, sb=3D0xc6372d34)
> >     at /usr/src/sys/kern/uipc_sockbuf.c:191
> > #15 0xc08447bc in tcp_do_segment (m=3D0xcaa8d200, th=3D0xca6aa824,
> > so=3D0xc6372ce0,
> >     tp=3D0xc63b4d20, drop_hdrlen=3D52, tlen=3D1448, iptos=3D0 '\0', ti_=
locked=3D2)
> >     at /usr/src/sys/netinet/tcp_input.c:1775
> > #16 0xc0847930 in tcp_input (m=3D0xcaa8d200, off0=3D20)
> >     at /usr/src/sys/netinet/tcp_input.c:1329
> > #17 0xc07ddaf7 in ip_input (m=3D0xcaa8d200)
> >     at /usr/src/sys/netinet/ip_input.c:787
> > #18 0xc07b8859 in netisr_dispatch_src (proto=3D1, source=3D0, m=3D0xcaa=
8d200)
> >     at /usr/src/sys/net/netisr.c:859
> > #19 0xc07b8af0 in netisr_dispatch (proto=3D1, m=3D0xcaa8d200)
> >     at /usr/src/sys/net/netisr.c:946
> > #20 0xc07ae5e1 in ether_demux (ifp=3D0xc56ed800, m=3D0xcaa8d200)
> >     at /usr/src/sys/net/if_ethersubr.c:894
> > #21 0xc07aeb5f in ether_input (ifp=3D0xc56ed800, m=3D0xcaa8d200)
> >     at /usr/src/sys/net/if_ethersubr.c:753
> > #22 0xc09977b2 in nfe_int_task (arg=3D0xc56ff000, pending=3D1)
> >     at /usr/src/sys/dev/nfe/if_nfe.c:2187
> > #23 0xc07387ca in taskqueue_run_locked (queue=3D0xc5702440)
> >     at /usr/src/sys/kern/subr_taskqueue.c:248
> > #24 0xc073895c in taskqueue_thread_loop (arg=3D0xc56ff130)
> >     at /usr/src/sys/kern/subr_taskqueue.c:385
> > #25 0xc06d1027 in fork_exit (callout=3D0xc07388a0 <taskqueue_thread_loo=
p>,
> >     arg=3D0xc56ff130, frame=3D0xc538ed28) at /usr/src/sys/kern/kern_for=
k.c:861
> > #26 0xc09a5c24 in fork_trampoline () at
> > /usr/src/sys/i386/i386/exception.s:275
> > (kgdb)
> >=20

BTW, we had a similar panic, "spinlock held too long", the spinlock
is the sched lock N, on busy 8-core box recently upgraded to the
stable/8. Unfortunately, machine hung dumping core, so the stack trace
for the owner thread was not available.

I was unable to make any conclusion from the data that was present.
If the situation is reproducable, you coulld try to revert r221937. This
is pure speculation, though.

--g0LfaOww9mCf9g0e
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk4VbEsACgkQC3+MBN1Mb4iU9gCgxnDJw+3nI7TIfBHBKi2QCTev
DwIAn2Zpb3dOwCkYNf03tBahoyOVYIfB
=jWgx
-----END PGP SIGNATURE-----

--g0LfaOww9mCf9g0e--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110707082027.GX48734>