Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2001 13:45:18 -0800
From:      Conrad Minshall <conrad@apple.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Jordan Hubbard <jkh@winston.freebsd.org>, Peter Wemm <peter@wemm.org>, Mike Smith <msmith@freebsd.org>, hackers@freebsd.org, msmith@mass.dis.org
Subject:   Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD    fall on its face in one easy step )
Message-ID:  <l03130307b8440fd3aac2@[17.219.180.26]>
In-Reply-To: <200112162024.fBGKOSt22277@apollo.backplane.com>
References:  <58885.1008217148@winston.freebsd.org> <l03130304b8413ae1338b@[17.219.180.26]> <l03130302b841fdae1ebc@[17.219.180.26]>

next in thread | previous in thread | raw e-mail | index | archive | help
At 12:24 PM -0800 12/16/01, Matthew Dillon wrote:

>    program runs fine in an overnight test.  We still have a known issue
>    with out-of-order operations from nfsiod's that apparently may come
>    up after a week or so of testing.  I asked Jordan to try to track down
>    the NeXT guy who fixed that one in the old NFS stack.

This bug showed up recently here with fsx testing.  I seem to have fixed it
last week in MacOS X.  The diffs were widespread but the idea was simple
enough so a few code snippets should suffice:

nfs_request gets a new argument (u_int64_t *xidp) and fills it in here:

        m = nfsm_rpchead(cred, nmp->nm_flag, procnum, auth_type, auth_len,
             auth_str, verf_len, verf_str, mrest, mrest_len, &mheadend, &xid);
        if (xidp)
                *xidp = xid + ((u_int64_t)nfs_xidwrap << 32);

nfsm_rpchead bumps nfs_xidwrap when avoiding a zero xid.

Callers of nfs_request take the returned xid and pass it via the macros to
nfs_loadattrcache, from which the following code is snipped:

        if (*xidp < np->n_xid) {
                /*
                 * We have already updated attributes with a response from
                 * a later request.  The attributes we have here are probably
                 * stale so we drop them (just return).  However, our
                 * out-of-order receipt could be correct - if the requests were
                 * processed out of order at the server.  Given the uncertainty
                 * we invalidate our cached attributes.  *xidp is zeroed here
                 * to indicate the attributes were dropped - only getattr
                 * cares - it needs to retry the rpc.
                 */
                np->n_attrstamp = 0;
                *xidp = 0;
                return (0);
        }

Further down in nfs_loadattrcache:

        np->n_xid = *xidp;

Note xids are kept in a 64 bit form so relative comparison won't fail in
the unlikely case that xids wrap around zero.

Here's the change in nfs_getattr.  I don't expect to ever see the panic.

        avoidfloods = 0;
tryagain:
        nfsstats.rpccnt[NFSPROC_GETATTR]++;
        nfsm_reqhead(vp, NFSPROC_GETATTR, NFSX_FH(v3));
        nfsm_fhtom(vp, v3);
        nfsm_request(vp, NFSPROC_GETATTR, ap->a_p, ap->a_cred, &xid);
        if (!error) {
                nfsm_loadattr(vp, ap->a_vap, &xid);
                if (!xid) { /* out-of-order rpc - attributes were dropped */
                        m_freem(mrep);
                        if (avoidfloods++ < 100)
                                goto tryagain;
                        /*
                         * avoidfloods>1 is bizarre.  at 100 pull the plug
                         */
                        panic("nfs_getattr: getattr flood\n");
                }






--
Conrad Minshall, conrad@apple.com, 408 974-2749
Apple Computer, Mac OS X Core Operating Systems



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?l03130307b8440fd3aac2>