Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Dec 2009 16:28:52 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Robert N. M. Watson" <rwatson@FreeBSD.org>
Cc:        pyunyh@gmail.com, dfr@FreeBSD.org, weldon@excelsusphoto.com, freebsd-current@FreeBSD.org, =?X-UNKNOWN?Q?Eirik_=C3~Xverby?= <ltning@anduin.net>, Gavin Atkinson <gavin@FreeBSD.org>
Subject:   Re: FreeBSD 8.0 - network stack crashes?
Message-ID:  <Pine.GSO.4.63.0912071623300.11928@muncher.cs.uoguelph.ca>
In-Reply-To: <BA47FDA1-1097-4C43-AF71-51E7227795B5@FreeBSD.org>
References:  <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net> <20091129013026.GA1355@michelle.cdnetworks.com> <74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net> <alpine.BSF.2.00.0911291427240.80654@fledge.watson.org> <34AD565D-814A-446A-B9CA-AC16DD762E1B@anduin.net> <A0C9ED20-5536-44E2-B26B-0F1AEC2AF79C@anduin.net> <BA47FDA1-1097-4C43-AF71-51E7227795B5@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-758783491-1260221332=:11928
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE



On Mon, 30 Nov 2009, Robert N. M. Watson wrote:

>
> On 30 Nov 2009, at 05:36, Eirik =D8verby wrote:
>
>> Short follow-up: Making OpenBSD use TCP mounts (it defaults to UDP) seem=
s to solve the issue.
>>
>> So this is a UDP-NFS-related problem, it would seem?
>
> Could well be. Let's try another debugging tactic -- there are two possib=
le things going on here: resource leak, and resource exhaustion leading to =
deadlock. If you shut down to single user mode from multi-user, and let the=
 system quiesce for a few minutes, then run netstat -m, what does it look l=
ike? Do vast numbers of mbufs+clusters get freed, or do they remain account=
ed for as allocated?
>
> (If they remain allocated, they were likely leaked, since most/all socket=
s will have been closed, releasing their resources on shutdown to single us=
er when all processes are killed)
>
> The theory of an mbuf leak in NFS isn't an unlikely theory -- the socket =
code there continues to change, and rare edge cases frequently lead to leak=
s (per my earlier e-mail). Perhaps there's a case the OpenBSD client is tri=
ggering that other NFS clients normally don't. If we think that's the case,=
 the next step is usually to narrow down what causes the leak to trigger a =
lot (i.e., the backup starting), and then grab a packet trace that we can a=
nalyze with wireshark. We'll want to look at the types of errors being retu=
rned for RPCs and, in particular, if there's one that happens about the sam=
e number of times as the resource has leaked over the same window, look at =
the code and see if that error case is handled properly.
>
> If this is definitely an NFS leak bug, we should get the NFS folks attent=
ion by sticking "NFS mbuf leak" in the subject line and CC'ing rmacklem/dfr=
=2E :-)
>
It's a bit of a shot in the dark, but could you please test the following
patch? It patches for a possible mbuf leak + a possible M_SONAME leak (I
have no idea if these ever occur in practice?). It also fixes a case where
the return value for svc_reply_dg() would have been TRUE for failure. It
was all I could see from a quick look.

rick
--- rpc/svc_dg.c.sav=092009-12-07 15:37:45.000000000 -0500
+++ rpc/svc_dg.c=092009-12-07 15:48:50.000000000 -0500
@@ -221,6 +221,8 @@
  =09xdrmbuf_create(&xdrs, mreq, XDR_DECODE);
  =09if (! xdr_callmsg(&xdrs, msg)) {
  =09=09XDR_DESTROY(&xdrs);
+=09=09if (raddr !=3D NULL)
+=09=09=09free(raddr, M_SONAME);
  =09=09return (FALSE);
  =09}

@@ -259,11 +261,13 @@
  =09=09m_fixhdr(mrep);
  =09=09error =3D sosend(xprt->xp_socket, addr, NULL, mrep, NULL,
  =09=09    0, curthread);
-=09=09if (!error) {
-=09=09=09stat =3D TRUE;
+=09=09if (error) {
+=09=09=09stat =3D FALSE;
  =09=09}
  =09} else {
  =09=09m_freem(mrep);
+=09=09if (m !=3D NULL)
+=09=09=09m_freem(m);
  =09}

  =09XDR_DESTROY(&xdrs);
---559023410-758783491-1260221332=:11928--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0912071623300.11928>