Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jan 2013 23:22:12 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Christian Gusenbauer <c47g@gmx.at>
Cc:        freebsd-fs@freebsd.org, net@freebsd.org
Subject:   Re: 9.1-stable crashes while copying data from a NFS mounted directory
Message-ID:  <20130124212212.GM2522@kib.kiev.ua>
In-Reply-To: <201301242150.52238.c47g@gmx.at>
References:  <201301241805.57623.c47g@gmx.at> <201301241950.49455.c47g@gmx.at> <20130124193709.GL2522@kib.kiev.ua> <201301242150.52238.c47g@gmx.at>

next in thread | previous in thread | raw e-mail | index | archive | help

--RS1722//baS0C3Tp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer wrote:
> On Thursday 24 January 2013 20:37:09 Konstantin Belousov wrote:
> > On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian Gusenbauer wrote:
> > > On Thursday 24 January 2013 19:07:23 Konstantin Belousov wrote:
> > > > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov wrote:
> > > > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wr=
ote:
> > > > > > Hi!
> > > > > >=20
> > > > > > I'm using 9.1 stable svn revision 245605 and I get the panic be=
low
> > > > > > if I execute the following commands (as single user):
> > > > > >=20
> > > > > > # swapon -a
> > > > > > # dumpon /dev/ada0s3b
> > > > > > # mount -u /
> > > > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> > > > > > # mount -t nfs -o rsize=3D32768 data:/multimedia /mnt
> > > > > > # cp /mnt/Movies/test/a.m2ts /tmp
> > > > > >=20
> > > > > > then the system panics almost immediately. I'll attach the stack
> > > > > > trace.
> > > > > >=20
> > > > > > Note, that I'm using jumbo frames (6144 byte) on a 1Gbit networ=
k,
> > > > > > maybe that's the cause for the panic, because the bcopy (see st=
ack
> > > > > > frame #15) fails.
> > > > > >=20
> > > > > > Any clues?
> > > > >=20
> > > > > I tried a similar operation with the nfs mount of rsize=3D32768 a=
nd mtu
> > > > > 6144, but the machine runs HEAD and em instead of age. I was unab=
le
> > > > > to reproduce the panic on the copy of the 5GB file from nfs mount.
> > >=20
> > > Hmmm, I did a quick test. If I do not change the MTU, so just configu=
ring
> > > age0 with
> > >=20
> > > # ifconfig age0 inet 192.168.2.2 up
> > >=20
> > > then I can copy all files from the mounted directory without any
> > > problems, too. So it's probably age0 related?
> >=20
> > From your backtrace and the buffer printout, I see somewhat strange thi=
ng.
> > The buffer data address is 0xffffff8171418000, while kernel faulted
> > at the attempt to write at 0xffffff8171413000, which is is lower then
> > the buffer data pointer, at the attempt to bcopy to the buffer.
> >=20
> > The other data suggests that there were no overflow of the data from the
> > server response. So it might be that mbuf_len(mp) returned negative num=
ber
> > ? I am not sure is it possible at all.
> >=20
> > Try this debugging patch, please. You need to add INVARIANTS etc to the
> > kernel config.
> >=20
> > diff --git a/sys/fs/nfs/nfs_commonsubs.c b/sys/fs/nfs/nfs_commonsubs.c
> > index efc0786..9a6bda5 100644
> > --- a/sys/fs/nfs/nfs_commonsubs.c
> > +++ b/sys/fs/nfs/nfs_commonsubs.c
> > @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio
> > *uiop, int siz) }
> >  				mbufcp =3D NFSMTOD(mp, caddr_t);
> >  				len =3D mbuf_len(mp);
> > +				KASSERT(len > 0, ("len %d", len));
> >  			}
> >  			xfer =3D (left > len) ? len : left;
> >  #ifdef notdef
> > @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio
> > *uiop, int siz) uiop->uio_resid -=3D xfer;
> >  		}
> >  		if (uiop->uio_iov->iov_len <=3D siz) {
> > +			KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d",
> > +			    uiop->uio_iovcnt));
> >  			uiop->uio_iovcnt--;
> >  			uiop->uio_iov++;
> >  		} else {
> >=20
> > I thought that server have returned too long response, but it seems to
> > be not the case from your data. Still, I think the patch below might be
> > due.
> >=20
> > diff --git a/sys/fs/nfsclient/nfs_clrpcops.c
> > b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907 100644
> > --- a/sys/fs/nfsclient/nfs_clrpcops.c
> > +++ b/sys/fs/nfsclient/nfs_clrpcops.c
> > @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio *uiop, stru=
ct
> > ucred *cred, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED);
> >  			eof =3D fxdr_unsigned(int, *tl);
> >  		}
> > -		NFSM_STRSIZ(retlen, rsize);
> > +		NFSM_STRSIZ(retlen, len);
> >  		error =3D nfsm_mbufuio(nd, uiop, retlen);
> >  		if (error)
> >  			goto nfsmout;
>=20
> I applied your patches and now I get a
>=20
> panic: len -4
> cpuid =3D 1
> KDB: enter: panic
> Dumping 377 out of 6116 MB:..5%..13%..22%..34%..43%..51%..64%..73%..81%..=
94%
>=20
This means that the age driver either produced corrupted mbuf chain,
or filled wrong negative value into the mbuf len field. I am quite
certain that the issue is in the driver.

I added the net@ to Cc:, hopefully you could get help there.
>=20
> #0  doadump (textdump=3D0)
>     at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265
> 265             if (textdump && textdump_pending) {
> (kgdb) #0  doadump (textdump=3D0)
>     at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:265
> #1  0xffffffff802a7490 in db_dump (dummy=3D<value optimized out>,
>     dummy2=3D<value optimized out>, dummy3=3D<value optimized out>,
>     dummy4=3D<value optimized out>)
>     at /spare/tmp/src-stable9/sys/ddb/db_command.c:538
> #2  0xffffffff802a6a7e in db_command (last_cmdp=3D0xffffffff808ca140,
>     cmd_table=3D<value optimized out>, dopager=3D1)
>     at /spare/tmp/src-stable9/sys/ddb/db_command.c:449
> #3  0xffffffff802a6cd0 in db_command_loop ()
>     at /spare/tmp/src-stable9/sys/ddb/db_command.c:502
> #4  0xffffffff802a8e29 in db_trap (type=3D<value optimized out>,
>     code=3D<value optimized out>)
>     at /spare/tmp/src-stable9/sys/ddb/db_main.c:231
> #5  0xffffffff803bf548 in kdb_trap (type=3D3, code=3D0, tf=3D0xffffff81b2=
ba1080)
>     at /spare/tmp/src-stable9/sys/kern/subr_kdb.c:649
> #6  0xffffffff80594c28 in trap (frame=3D0xffffff81b2ba1080)
>     at /spare/tmp/src-stable9/sys/amd64/amd64/trap.c:579
> #7  0xffffffff8057e06f in calltrap ()
>     at /spare/tmp/src-stable9/sys/amd64/amd64/exception.S:228
> #8  0xffffffff803beffb in kdb_enter (why=3D0xffffffff8060ebcf "panic",
>     msg=3D0x80 <Address 0x80 out of bounds>) at cpufunc.h:63
> #9  0xffffffff80389391 in panic (fmt=3D<value optimized out>)
>     at /spare/tmp/src-stable9/sys/kern/kern_shutdown.c:627
> #10 0xffffffff81e5bab2 in nfsm_mbufuio (nd=3D0xffffff81b2ba1340, uiop=3D0=
x7cf,
>     siz=3D18)
>     at /spare/tmp/src-stable9/sys/modules/nfscommon/../../fs/nfs/nfs_comm=
onsubs.c:202
> #11 0xffffffff81e195c1 in nfsrpc_read (vp=3D0xfffffe0006c94dc8,
>     uiop=3D0xffffff81b2ba15c0, cred=3D<value optimized out>,
>     p=3D0xfffffe0006aa6490, nap=3D0xffffff81b2ba14a0,
>     attrflagp=3D0xffffff81b2ba156c, stuff=3D0x0)
>     at /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_cl=
rpcops.c:1343
> #12 0xffffffff81e3bd80 in ncl_readrpc (vp=3D0xfffffe0006c94dc8,
>     uiop=3D0xffffff81b2ba15c0, cred=3D<value optimized out>)
>     at /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_cl=
vnops.c:1366
> #13 0xffffffff81e3086b in ncl_doio (vp=3D0xfffffe0006c94dc8,
>     bp=3D0xffffff816f8f4120, cr=3D0xfffffe0002d58e00, td=3D0xfffffe0006aa=
6490,
>     called_from_strategy=3D0)
>     at /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_cl=
bio.c:1605
> #14 0xffffffff81e3254f in ncl_bioread (vp=3D0xfffffe0006c94dc8,
>     uio=3D0xffffff81b2ba1ad0, ioflag=3D<value optimized out>,
>     cred=3D0xfffffe0002d58e00)
>     at /spare/tmp/src-stable9/sys/modules/nfscl/../../fs/nfsclient/nfs_cl=
bio.c:541
> #15 0xffffffff80434ae8 in vn_read (fp=3D0xfffffe0006abda50,
>     uio=3D0xffffff81b2ba1ad0, active_cred=3D<value optimized out>,
>     flags=3D<value optimized out>, td=3D<value optimized out>) at vnode_i=
f.h:384
> #16 0xffffffff8043206e in vn_io_fault (fp=3D0xfffffe0006abda50,
>     uio=3D0xffffff81b2ba1ad0, active_cred=3D0xfffffe0002d58e00, flags=3D0,
>     td=3D0xfffffe0006aa6490) at /spare/tmp/src-stable9/sys/kern/vfs_vnops=
=2Ec:903
> #17 0xffffffff803d7ac1 in dofileread (td=3D0xfffffe0006aa6490, fd=3D3,
>     fp=3D0xfffffe0006abda50, auio=3D0xffffff81b2ba1ad0,
>     offset=3D<value optimized out>, flags=3D0) at file.h:287
> #18 0xffffffff803d7e1c in kern_readv (td=3D0xfffffe0006aa6490, fd=3D3,
>     auio=3D0xffffff81b2ba1ad0)
>     at /spare/tmp/src-stable9/sys/kern/sys_generic.c:250
> #19 0xffffffff803d7f34 in sys_read (td=3D<value optimized out>,
>     uap=3D<value optimized out>)
>     at /spare/tmp/src-stable9/sys/kern/sys_generic.c:166
> #20 0xffffffff80593cb3 in amd64_syscall (td=3D0xfffffe0006aa6490, traced=
=3D0)
>     at subr_syscall.c:135
> #21 0xffffffff8057e357 in Xfast_syscall ()
>     at /spare/tmp/src-stable9/sys/amd64/amd64/exception.S:387
> #22 0x00000008009245fc in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)

--RS1722//baS0C3Tp
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRAaYEAAoJEJDCuSvBvK1B9dwP/109z+fMvI1HtV7JEV64d/TO
DXy6Kicq6vVIhWmIpmWnWITbrt2IOIhTpqaw3m5F9jDMP9p1MvTnOuyuJ8bB77Kx
ZOMg59gQUaJmfWpR2yPOMOSAw1OAbgPkke3gX+Tn4NEyUrS3p32AR9+tN79QE/Tj
ZrPJ8weZ/8A5f76ZfaQMbbrOohvnYc5kHMs9avwc2/x2hiLMNHBC0k/c5ZPAZ9mx
o8FH/4HeAcPdUjyobKKVohoJyv1DeQqwOvw086fz2gNhz7uIdVRQE2kI9QytEyz4
T+lHaTMG0QYkWZ2bYK9kkpwpBBGp7pBcP4flRnHExr5WNQk7BHKVYtqhgy3M4r7J
heH4lsUnujDKQtIjxALfYH37BUE9oKuca5g3sYDHwL0tTQg0Hh/Bh1ZkD/dLTqO+
Ju5+NmRl18sdQ/jRZlbD/ljTor86YugGpWt6EZnRjMnEgsstKqwhJW+zukzIhgYU
NcGSAZL7dhaZXkrzQ2SXPb/n74llmu3Nhhpgavvyfz72FtB2A8g9b8bPTCUeDVfn
pzrN4RFAaoXAf9F9Jd0VQhlPIunWNJ7LPYKfu7MjZ2F+Gc5q38qwIrCWr8l9j2R1
oP2Sw3P2rxfsdhXpyqiTUkigXtlrC3SmeV/KzCstSaJghHXTURzUxdSiU5M2GV/v
IzHs0gCb02hj3bXW9Lz1
=NsNv
-----END PGP SIGNATURE-----

--RS1722//baS0C3Tp--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130124212212.GM2522>