Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Jul 2013 23:10:51 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Ian FREISLICH <ianf@clue.co.za>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Filesystem wedges caused by r251446
Message-ID:  <20130712201051.GI91021@kib.kiev.ua>
In-Reply-To: <E1Uxhoe-0000d9-Sc@clue.co.za>
References:  <201307110923.06548.jhb@freebsd.org> <201307091202.24493.jhb@freebsd.org> <E1UufRq-0001sg-HG@clue.co.za> <E1UxEWB-0000il-21@clue.co.za> <E1Uxhoe-0000d9-Sc@clue.co.za>

next in thread | previous in thread | raw e-mail | index | archive | help

--rMuTkhzRlt+HYtLC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 12, 2013 at 08:11:36PM +0200, Ian FREISLICH wrote:
> John Baldwin wrote:
> > On Thursday, July 11, 2013 6:54:35 am Ian FREISLICH wrote:
> > > John Baldwin wrote:
> > > > On Thursday, July 04, 2013 5:03:29 am Ian FREISLICH wrote:
> > > > > Konstantin Belousov wrote:
> > > > > >=20
> > > > > > Care to provide any useful information ?
> > > > > >=20
> > > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-
> > > > handbook/kerneldebug-deadlocks.html
> > > > >=20
> > > > > Well, the system doesn't deadlock it's perfectly useable so long
> > > > > as you don't touch the file that's wedged.  A lot of the time the
> > > > > userland process is unkillable, but often it is killable.  How do
> > > > > I get from from the PID to where the FS is stuck in the kernel?
> > > >=20
> > > > Use kgdb.  'proc <pid>', then 'bt'.
> > >=20
> > > So, I setup a remote kbgd session, but I still can't figure out how
> > > to get at the information we need.
> > >=20
> > > (kgdb) proc 5176
> > > only supported for core file target
> > >=20
> > > In the mean time, I'll just force it to make a core dump from ddb.
> > > However, I can't reacreate the issue while the mirror (gmirror) is
> > > rebuilding, so we'll have to wait for that to finish.
> >=20
> > Sorrry, just run 'sudo kgdb' on the box itself.  You can inspect the ru=
nning
> > kernel without having to stop it.
>=20
> So, this machine's installworld *always* stalls installing clang.
> The install can be stopped (ctrl-c) leaving behind this process:
>=20
> root    23147   0.0  0.0   9268  1512  1  D     7:51PM  0:00.01 install -=
s -o root -g wheel -m 555 clang /usr/bin/clang
>=20
> This is the backtrace from gdb.  I suspect frame 4.
>=20
> (kgdb) proc 23147
> [Switching to thread 117 (Thread 100059)]#0  sched_switch (
>     td=3D0xfffffe000c012920, newtd=3D0x0, flags=3D<value optimized out>)
>     at /usr/src/sys/kern/sched_ule.c:1954
> 1954                    cpuid =3D PCPU_GET(cpuid);
> Current language:  auto; currently minimal
> (kgdb) bt
> #0  sched_switch (td=3D0xfffffe000c012920, newtd=3D0x0,=20
>     flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1954
> #1  0xffffffff8047539e in mi_switch (flags=3D260, newtd=3D0x0)
>     at /usr/src/sys/kern/kern_synch.c:487
> #2  0xffffffff804acbea in sleepq_wait (wchan=3D0x0, pri=3D0)
>     at /usr/src/sys/kern/subr_sleepqueue.c:620
> #3  0xffffffff80474ee9 in _sleep (ident=3D<value optimized out>,=20
>     lock=3D0xffffffff80a20300, priority=3D84, wmesg=3D0xffffffff8071129a =
"wdrain",=20
>     sbt=3D<value optimized out>, pr=3D0, flags=3D<value optimized out>)
>     at /usr/src/sys/kern/kern_synch.c:249
> #4  0xffffffff804e6523 in waitrunningbufspace ()
>     at /usr/src/sys/kern/vfs_bio.c:564
> #5  0xffffffff804e6073 in bufwrite (bp=3D<value optimized out>)
>     at /usr/src/sys/kern/vfs_bio.c:1226
> #6  0xffffffff804f05ed in cluster_wbuild (vp=3D0xfffffe008fec4000, size=
=3D32768,=20
>     start_lbn=3D136, len=3D<value optimized out>, gbflags=3D<value optimi=
zed out>)
>     at /usr/src/sys/kern/vfs_cluster.c:1002
> #7  0xffffffff804efbc3 in cluster_write (vp=3D0xfffffe008fec4000,=20
>     bp=3D0xffffff80f83da6f0, filesize=3D4456448, seqcount=3D127,=20
>     gbflags=3D<value optimized out>) at /usr/src/sys/kern/vfs_cluster.c:5=
92
> #8  0xffffffff805c1032 in ffs_write (ap=3D0xffffff8121c81850)
>     at /usr/src/sys/ufs/ffs/ffs_vnops.c:801
> #9  0xffffffff8067fe21 in VOP_WRITE_APV (vop=3D<value optimized out>,=20
> ---Type <return> to continue, or q <return> to quit---=20
>     a=3D<value optimized out>) at vnode_if.c:999
> #10 0xffffffff80511eca in vn_write (fp=3D0xfffffe006a5f7410,=20
>     uio=3D0xffffff8121c81a90, active_cred=3D0x0, flags=3D<value optimized=
 out>,=20
>     td=3D0x0) at vnode_if.h:413
> #11 0xffffffff8050eb3a in vn_io_fault (fp=3D0xfffffe006a5f7410,=20
>     uio=3D0xffffff8121c81a90, active_cred=3D0xfffffe006a6ca000, flags=3D0=
,=20
>     td=3D0xfffffe000c012920) at /usr/src/sys/kern/vfs_vnops.c:983
> #12 0xffffffff804b506a in dofilewrite (td=3D0xfffffe000c012920, fd=3D5,=
=20
>     fp=3D0xfffffe006a5f7410, auio=3D0xffffff8121c81a90,=20
>     offset=3D<value optimized out>, flags=3D0) at file.h:290
> #13 0xffffffff804b4cde in sys_write (td=3D0xfffffe000c012920,=20
>     uap=3D<value optimized out>) at /usr/src/sys/kern/sys_generic.c:460
> #14 0xffffffff8061807a in amd64_syscall (td=3D0xfffffe000c012920, traced=
=3D0)
>     at subr_syscall.c:134
> #15 0xffffffff806017ab in Xfast_syscall ()
>     at /usr/src/sys/amd64/amd64/exception.S:387
> #16 0x000000000044e75a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)=20

Please apply (mostly debugging) patch below, then reproduce the issue.
I need the backtrace of the 'main' hung process, assuming it is stuck
in the waitrunningbufspace().  Also, from the same kgdb session, print
runningbufreq, runningbufspace and lorunningspace.

diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 68021e0..205e9b3 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -474,10 +474,12 @@ runningbufwakeup(struct buf *bp)
 {
 	long space, bspace;
=20
-	if (bp->b_runningbufspace =3D=3D 0)
-		return;
-	space =3D atomic_fetchadd_long(&runningbufspace, -bp->b_runningbufspace);
 	bspace =3D bp->b_runningbufspace;
+	if (bspace =3D=3D 0)
+		return;
+	space =3D atomic_fetchadd_long(&runningbufspace, -bspace);
+	KASSERT(space >=3D bspace, ("runningbufspace underflow %ld %ld",
+	    space, bspace));
 	bp->b_runningbufspace =3D 0;
 	/*
 	 * Only acquire the lock and wakeup on the transition from exceeding
@@ -561,7 +563,7 @@ waitrunningbufspace(void)
=20
 	mtx_lock(&rbreqlock);
 	while (runningbufspace > hirunningspace) {
-		++runningbufreq;
+		runningbufreq =3D 1;
 		msleep(&runningbufreq, &rbreqlock, PVM, "wdrain", 0);
 	}
 	mtx_unlock(&rbreqlock);

--rMuTkhzRlt+HYtLC
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR4GLKAAoJEJDCuSvBvK1BQPYP/3tj9KA9tIg+4DtJq2Otehji
juvxbqQKxttj6Cdr6hE4vv63qUGE2K0iCtKvlkEIG60AS3jBMha4gEZxVbrYZEE4
i3HgF0PTqY4SRCzCJ4cHBul7c6Dwxce1A+l+NdGakssYgi2bgX9NI/7kJf7mSFZs
vPs6q3GuOuAqyxbaltOiFF/1NR+y1QZdriJSCOORYKL4bwB2ZTiUTJtakJdhOKn9
1XXtunxbFZqjjoA6zHz7uJdIBNSBkp4rgUEhho8wjTvS0/5Ku1bhVW+1TS6DkKp3
w2X0Zlcx3O9JFwsBKEuqPwp2E7u4l7+CA8U89/q7ba6EwWV3a8Emh7wdDcY86CKM
y5/oaclz5xVu4Nef81LYGmLdwa6w16w7Zg6VV9gu3jnb4alPzDjN0wdNu5uusPPw
i1XDexk32XPmXXCljdW8KWfcdQ7pMk4H2sX0r/Hp1cFOmc+68Snpm2ODiQEGfJDQ
DWYrZWVMoLXIRK7dsBTyKrLTq06vwcdUJZlESivbST24vQF9Ehfbs2nHCvi4jYi8
q/Kuoeyp8p2vPawXONnQKMZtipVspG4uoY/Lei83L3cakvn3N9FbiPcuyoVN92SD
CFkCLx8a8qXn1HY+Js9qCZOfNFLhas4XtqsGp6xv3pR93JIfEi4yxF/wozs2XbRn
q+gm6hr+ueyCtENxFZFU
=foli
-----END PGP SIGNATURE-----

--rMuTkhzRlt+HYtLC--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130712201051.GI91021>