Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Aug 2011 10:52:01 -0700
From:      Chip Camden <sterling@camdensoftware.com>
To:        freebsd-stable@FreeBSD.org
Subject:   Re: panic: spin lock held too long (RELENG_8 from today)
Message-ID:  <20110817175201.GB1973@libertas.local.camdensoftware.com>
In-Reply-To: <20110818.023832.373949045518579359.hrs@allbsd.org>
References:  <20110707082027.GX48734@deviant.kiev.zoral.com.ua> <4E159959.2070401@sentex.net> <4E15A08C.6090407@sentex.net> <20110818.023832.373949045518579359.hrs@allbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--ftEhullJWpWg/VHq
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Hiroki Sato on Thursday, 18 August 2011:
> Hi,
>=20
> Mike Tancsa <mike@sentex.net> wrote
>   in <4E15A08C.6090407@sentex.net>:
>=20
> mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> mi> >>
> mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock
> mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
> mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack t=
race
> mi> >> for the owner thread was not available.
> mi> >>
> mi> >> I was unable to make any conclusion from the data that was present.
> mi> >> If the situation is reproducable, you coulld try to revert r221937=
. This
> mi> >> is pure speculation, though.
> mi> >
> mi> > Another crash just now after 5hrs uptime. I will try and revert r22=
1937
> mi> > unless there is any extra debugging you want me to add to the kernel
> mi> > instead  ?
>=20
>  I am also suffering from a reproducible panic on an 8-STABLE box, an
>  NFS server with heavy I/O load.  I could not get a kernel dump
>  because this panic locked up the machine just after it occurred, but
>  according to the stack trace it was the same as posted one.
>  Switching to an 8.2R kernel can prevent this panic.
>=20
>  Any progress on the investigation?
>=20
> --
> spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (t=
id 100489) too long
> panic: spin lock held too long
> cpuid =3D 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> panic() at panic+0x187
> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> _mtx_lock_spin() at _mtx_lock_spin+0x9e
> sched_add() at sched_add+0x117
> setrunnable() at setrunnable+0x78
> sleepq_signal() at sleepq_signal+0x7a
> cv_signal() at cv_signal+0x3b
> xprt_active() at xprt_active+0xe3
> svc_vc_soupcall() at svc_vc_soupcall+0xc
> sowakeup() at sowakeup+0x69
> tcp_do_segment() at tcp_do_segment+0x25e7
> tcp_input() at tcp_input+0xcdd
> ip_input() at ip_input+0xac
> netisr_dispatch_src() at netisr_dispatch_src+0x7e
> ether_demux() at ether_demux+0x14d
> ether_input() at ether_input+0x17d
> em_rxeof() at em_rxeof+0x1ca
> em_handle_que() at em_handle_que+0x5b
> taskqueue_run_locked() at taskqueue_run_locked+0x85
> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --
>=20
> -- Hiroki


I'm also getting similar panics on 8.2-STABLE.  Locks up everything and I
have to power off.  Once, I happened to be looking at the console when it
happened and copied dow the following:

Sleeping thread (tif 100037, pid 0) owns a non-sleepable lock
panic: sleeping thread
cpuid=3D1

Another time I got:

lock order reversal:
1st 0xffffff000593e330 snaplk (snaplk) @ /usr/src/sys/kern/vfr_vnops.c:296
2nd 0xffffff0005e5d578 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587

I didn't copy down the traceback.

These panics seem to hit when I'm doing heavy WAN I/O.  I can go for
about a day without one as long as I stay away from the web or even chat.
Last night this system copied a backup of 35GB over the local network
without failing, but as soon as I hopped onto Firefox this morning, down
she went.  I don't know if that's coincidence or useful data.

I didn't get to say "Thanks" to Eitan Adler for attempting to help me
with this on Monday night.  Thanks, Eitan!

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--ftEhullJWpWg/VHq
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOS//BAAoJEIpckszW26+Rb80H/3/7eQlINeIaoLUz6iE2dSG8
/7Eoyt87VSs1H8XUYVPD+tiYXFgvpz6zu49zkXTcNwS/kwgJjzMHngEeY3eKom8v
6iaWilwe12nrkDOdkJZXB4kml6WTa71VkAlpC0hUJHuPD+trriZfSdJKDBwOXaA/
rJzp25k0TZU+BlJQJr3eXGPP1L/KjxSPLbIeowGWpV7ZPcRQRm3JerAGcn3f38ud
PR4cBwVKHcPYzLm8ZAQLL99QJy5ZqyTWjLVE16Erc2AUyD1coURH2X6w3JtJ4mQ2
YBQhdREV1tchj/mvM30b/xnozcjTZuHDOoXpZgGPxKAQqDRG3Y7FG5jc33yELjg=
=OoPD
-----END PGP SIGNATURE-----

--ftEhullJWpWg/VHq--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110817175201.GB1973>