From owner-freebsd-fs@FreeBSD.ORG  Fri Jan 25 15:25:00 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 440A6DE9;
 Fri, 25 Jan 2013 15:25:00 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D8AD328E;
 Fri, 25 Jan 2013 15:24:58 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAHKiAlGDaFvO/2dsb2JhbABEhka4I3OCHgEBAQMBAQEBIAQnIAsFFg4KAgINGQIpAQkmBggCBQQBHASHaAYMq3SSVoEji2EQgxiBEwOIYYp+gi6BHI8sgxWBUTU
X-IronPort-AV: E=Sophos;i="4.84,538,1355115600"; d="scan'208";a="10883058"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu.net.uoguelph.ca with ESMTP; 25 Jan 2013 10:24:51 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7FB8EB3F4E;
 Fri, 25 Jan 2013 10:24:51 -0500 (EST)
Date: Fri, 25 Jan 2013 10:24:51 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <582361609.2352469.1359127491504.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <1436756651.2351478.1359126414802.JavaMail.root@erie.cs.uoguelph.ca>
Subject: Re: 9.1-stable crashes while copying data from a NFS mounted directory
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Christian Gusenbauer <c47g@gmx.at>,
 yongari@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Jan 2013 15:25:00 -0000

I wrote:
> Konstantin Belousov wrote:
> > On Thu, Jan 24, 2013 at 09:39:38PM -0500, Rick Macklem wrote:
> > > John Baldwin wrote:
> > > > On Thursday, January 24, 2013 4:22:12 pm Konstantin Belousov
> > > > wrote:
> > > > > On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer
> > > > > wrote:
> > > > > > On Thursday 24 January 2013 20:37:09 Konstantin Belousov
> > > > > > wrote:
> > > > > > > On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian
> > > > > > > Gusenbauer
> > > > > > > wrote:
> > > > > > > > On Thursday 24 January 2013 19:07:23 Konstantin Belousov
> > > > > > > > wrote:
> > > > > > > > > On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin
> > > > > > > > > Belousov wrote:
> > > > > > > > > > On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian
> > > > > > > > > > Gusenbauer wrote:
> > > > > > > > > > > Hi!
> > > > > > > > > > >
> > > > > > > > > > > I'm using 9.1 stable svn revision 245605 and I get
> > > > > > > > > > > the
> > > > > > > > > > > panic below
> > > > > > > > > > > if I execute the following commands (as single
> > > > > > > > > > > user):
> > > > > > > > > > >
> > > > > > > > > > > # swapon -a
> > > > > > > > > > > # dumpon /dev/ada0s3b
> > > > > > > > > > > # mount -u /
> > > > > > > > > > > # ifconfig age0 inet 192.168.2.2 mtu 6144 up
> > > > > > > > > > > # mount -t nfs -o rsize=32768 data:/multimedia
> > > > > > > > > > > /mnt
> > > > > > > > > > > # cp /mnt/Movies/test/a.m2ts /tmp
> > > > > > > > > > >
> > > > > > > > > > > then the system panics almost immediately. I'll
> > > > > > > > > > > attach
> > > > > > > > > > > the stack
> > > > > > > > > > > trace.
> > > > > > > > > > >
> > > > > > > > > > > Note, that I'm using jumbo frames (6144 byte) on a
> > > > > > > > > > > 1Gbit
> > > > > > > > > > > network,
> > > > > > > > > > > maybe that's the cause for the panic, because the
> > > > > > > > > > > bcopy
> > > > > > > > > > > (see stack
> > > > > > > > > > > frame #15) fails.
> > > > > > > > > > >
> > > > > > > > > > > Any clues?
> > > > > > > > > >
> > > > > > > > > > I tried a similar operation with the nfs mount of
> > > > > > > > > > rsize=32768 and mtu
> > > > > > > > > > 6144, but the machine runs HEAD and em instead of
> > > > > > > > > > age.
> > > > > > > > > > I
> > > > > > > > > > was unable
> > > > > > > > > > to reproduce the panic on the copy of the 5GB file
> > > > > > > > > > from
> > > > > > > > > > nfs mount.
> > > > > > > >
> > > > > > > > Hmmm, I did a quick test. If I do not change the MTU, so
> > > > > > > > just
> > > > > > > > configuring
> > > > > > > > age0 with
> > > > > > > >
> > > > > > > > # ifconfig age0 inet 192.168.2.2 up
> > > > > > > >
> > > > > > > > then I can copy all files from the mounted directory
> > > > > > > > without
> > > > > > > > any
> > > > > > > > problems, too. So it's probably age0 related?
> > > > > > >
> > > > > > > From your backtrace and the buffer printout, I see
> > > > > > > somewhat
> > > > > > > strange thing.
> > > > > > > The buffer data address is 0xffffff8171418000, while
> > > > > > > kernel
> > > > > > > faulted
> > > > > > > at the attempt to write at 0xffffff8171413000, which is is
> > > > > > > lower
> > > > > > > then
> > > > > > > the buffer data pointer, at the attempt to bcopy to the
> > > > > > > buffer.
> > > > > > >
> > > > > > > The other data suggests that there were no overflow of the
> > > > > > > data
> > > > > > > from the
> > > > > > > server response. So it might be that mbuf_len(mp) returned
> > > > > > > negative number
> > > > > > > ? I am not sure is it possible at all.
> > > > > > >
> > > > > > > Try this debugging patch, please. You need to add
> > > > > > > INVARIANTS
> > > > > > > etc
> > > > > > > to the
> > > > > > > kernel config.
> > > > > > >
> > > > > > > diff --git a/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > b/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > index efc0786..9a6bda5 100644
> > > > > > > --- a/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > +++ b/sys/fs/nfs/nfs_commonsubs.c
> > > > > > > @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript
> > > > > > > *nd,
> > > > > > > struct uio
> > > > > > > *uiop, int siz) }
> > > > > > >  				mbufcp = NFSMTOD(mp, caddr_t);
> > > > > > >  				len = mbuf_len(mp);
> > > > > > > + KASSERT(len > 0, ("len %d", len));
> > > > > > >  			}
> > > > > > >  			xfer = (left > len) ? len : left;
> > > > > > >  #ifdef notdef
> > > > > > > @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript
> > > > > > > *nd,
> > > > > > > struct uio
> > > > > > > *uiop, int siz) uiop->uio_resid -= xfer;
> > > > > > >  		}
> > > > > > >  		if (uiop->uio_iov->iov_len <= siz) {
> > > > > > > + KASSERT(uiop->uio_iovcnt > 1, ("uio_iovcnt %d",
> > > > > > > + uiop->uio_iovcnt));
> > > > > > >  			uiop->uio_iovcnt--;
> > > > > > >  			uiop->uio_iov++;
> > > > > > >  		} else {
> > > > > > >
> > > > > > > I thought that server have returned too long response, but
> > > > > > > it
> > > > > > > seems to
> > > > > > > be not the case from your data. Still, I think the patch
> > > > > > > below
> > > > > > > might be
> > > > > > > due.
> > > > > > >
> > > > > > > diff --git a/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907
> > > > > > > 100644
> > > > > > > --- a/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > +++ b/sys/fs/nfsclient/nfs_clrpcops.c
> > > > > > > @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct
> > > > > > > uio
> > > > > > > *uiop, struct
> > > > > > > ucred *cred, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED);
> > > > > > >  			eof = fxdr_unsigned(int, *tl);
> > > > > > >  		}
> > > > > > > - NFSM_STRSIZ(retlen, rsize);
> > > > > > > + NFSM_STRSIZ(retlen, len);
> > > > > > >  		error = nfsm_mbufuio(nd, uiop, retlen);
> > > > > > >  		if (error)
> > > > > > >  			goto nfsmout;
> > > > > >
> > > I think this patch is appropriate, although I don't see it as too
> > > critical. It just tightens the "sanity check" on the read reply
> > > length (which should never exceed what the client requested).
> > I agree, but client cannot control the server response.
> Righto. However, the server can put the correct "size" at the
> beginning
> of the reply and then follow it with the wrong amount of data. Those
> are
> the cases that the EBADRPC reply from nfsm_mbufuio() hopefully
> catches.
> 
> But, I agree that changing "rsize" to "len" is correct in this case
> and
> would catch the case where the server replies with too large a "size".
> 
> > Anyway, I
> > think
> > there is too much things that could go wrong if the server actively
> > exploit the client code.
> >
> > >
> > > nfsm_mbufuio() shouldn't transfer more than the uio structure can
> > > handle, even if the replied read size is larger than requested.
> > Yes, this what happen, I suppose, due to the decrement of the
> > uio_iovcnt
> > and the EBADRPC error return at the beginning of loop. But IMO the
> > situation should be catched and asserted instead. This is why I
> > added
> > KASSERT(uio_iovcnt > 1) before the decrement.
> >
> > I do not think that we should both add my KASSERT for iovcnt and
> > leave
> > the EBADRPC return. What is your preference there ?
> >
> Well, if a server sends a reply with "size == 16384", but then follows
> it with 32768bytes of data, I don't think that should panic the
> client,
> since it isn't a client bug. I think this is where the EBADRPC reply
> for
> iovcnt will happen.
Oops, I realize this wouldn't do the iovcnt EBADRPC reply. I would just
use the bytes after 16384 as the next field of the reply. I think this
one can be a KASSERT(), assuming the patch that changes "rsize" to "len"
for NFSM_STRSIZ() is applied.

I'll take another look at the code and email again, if I still haven't
gotten this right;-)

rick

> The reverse: A reply with "size == 32768", but followed by 16384bytes
> of
> data, would hit the end of the mbuf list. I don't think that should be
> a KASSERT() either since, again, it is a server bug and not a client
> one.
> 
> Now, the case where the mbuf list is bogus (bad m_len or ???) I see
> the argument for KASSERT()s, since it is a bug somewhere in the client
> machine and that bug may soon (or have already) corrupted other data
> structures, even if there is no damage done for this case once caught.
> I'm fine with KASSERT()s for these. (A KASSERT() panic with a message
> is probably easier to debug than a bcopy() crash, although you did the
> latter amazingly well;-)
> 
> > >
> > > It does seem that nfsm_mbufuio() should apply a sanity check on
> > > m_len. I think m_len == 0 is ok, but negative or very large should
> > > be checked for. Maybe just return EBADRPC after a printf() instead
> > > of a KASSERT(), as a safety belt against a trashed m_len from a
> > > driver
> > > or ???
> > The same as with the overflowed size, it would only hide another bug
> > in
> > the kernel. Probably, the assert m_len >= 0 and other useful
> > assertions
> > should be performed in the central place in the network stack, to
> > catch
> > an error earlier and for all consumers.
> >
> Yes. That sounds like a good idea to me. To be honest, there are a lot
> of places where a bogus mbuf list can trash the NFS code, so catching
> it in a central place would be good, I think.
> 
> Thanks for looking into this, rick
> 
> > >
> > > rick
> > >
> > > > > > I applied your patches and now I get a
> > > > > >
> > > > > > panic: len -4
> > > > > > cpuid = 1
> > > > > > KDB: enter: panic
> > > > > > Dumping 377 out of 6116
> > > > > > MB:..5%..13%..22%..34%..43%..51%..64%..73%..81%..94%
> > > > > >
> > > > > This means that the age driver either produced corrupted mbuf
> > > > > chain,
> > > > > or filled wrong negative value into the mbuf len field. I am
> > > > > quite
> > > > > certain that the issue is in the driver.
> > > > >
> > > > > I added the net@ to Cc:, hopefully you could get help there.
> > > >
> > > > And I've cc'd Pyun who has written most of this driver and is
> > > > likely
> > > > the one
> > > > most familiar with its handling of jumbo frames.
> > > >
> > > > --
> > > > John Baldwin
> > > > _______________________________________________
> > > > freebsd-fs@freebsd.org mailing list
> > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > > To unsubscribe, send any mail to
> > > > "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"