Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Feb 2010 12:21:28 +0000
From:      "Robert N. M. Watson" <rwatson@freebsd.org>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        stable@freebsd.org, freebsd-fs@freebsd.org, =?iso-8859-1?Q?Gerrit_K=FChn?= <gerrit@pmp.uni-hannover.de>, Willem Jan Withagen <wjw@digiware.nl>, =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject:   Re: mbuf leakage with nfs/zfs? 
Message-ID:  <F794BAF9-4ADC-48F8-ACC3-C441EF77A8F2@freebsd.org>
In-Reply-To: <E1Nlhzr-0004ci-3W@kabab.cs.huji.ac.il>
References:  <20100226174021.8feadad9.gerrit@pmp.uni-hannover.de> <E1Nl6VA-000557-D9@kabab.cs.huji.ac.il> <20100226224320.8c4259bf.gerrit@pmp.uni-hannover.de> <4B884757.9040001@digiware.nl> <20100227080220.ac6a2e4d.gerrit@pmp.uni-hannover.de> <4B892918.4080701@digiware.nl> <20100227202105.f31cbef7.gerrit@pmp.uni-hannover.de> <20100227193819.GA60576@icarus.home.lan> <BD8AC9F6-DF96-41F9-8E92-48A4E5606DC7@anduin.net> <4B89943C.70704@digiware.nl> <20100227220310.GA65110@icarus.home.lan> <E1Nlhzr-0004ci-3W@kabab.cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help

On Feb 28, 2010, at 12:11 PM, Daniel Braniss wrote:

>> I'm pulling in Robert Watson, who has some familiarity with the UDP
>> stack/code in FreeBSD.  I'm not sure he'll be a sufficient source of
>> knowledge for this specific issue since it appears (?) to be specific =
to
>> NFS; Rick Macklem would be a better choice, but as reported, he's =
MIA.
>>=20
>> Robert, are you aware of any changes or implementation issues which
>> might cause excessive (read: leaking) mbuf use under UDP-based NFS?  =
Do
>> you know of a way folks could determine the source of the leak, =
either
>> via DDB or while the system is live?
>=20
> I have been runing some tests in a controlled environment.
>=20
> server and client are both 64bit Xeon/X5550 @  2.67GHz with 16Gb of =
memory
> FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
>=20
> the client is runing latest 8.0 stable
> the load is created by runing 'make -j32 buildworld' and sleeping 150 =
sec.
> in between runs, this is the straight line you will see in the graphs.
> Both the src and obj directories are NFS mounted from the server, =
regular UFS.
>=20
> when server is running 7.2-stable no leakage is seen.
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-7.2.ps
> when server is runing 8.0-stable
> see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbufs/{tcp,udp}-8.0.ps
> you can see that udp is leaking!
>=20
> cheers,
> 	danny
> ps: I think the subject should be changed again, removing zfs ...

This type of problem (occurs with one client but not another) is almost =
always the result of the access pattern of a particular client =
triggering a specific (and perhaps single) bug in error-handling. For =
example, we might not be properly freeing the received request when =
generating an EPERM in an edge case. The hard bit is identifying which =
it is. If it's reproducible with UDP, then usually the process is:

- Build a minimal test case to trigger the problem -- ideally with as =
little complexity as possible.
- Run netstat -m at the beginning of the test and the end of the test on =
the server to count the number of leaked mbufs
- Run wireshark throughout the test
- Walk the wireshark trace looking for some error that occurs at about =
the same or slightly lower number of times then the number of mbufs =
leaked
- Iterate, narrowing the test case until it's either obvious exactly =
what's going on, or you've identified a relatively constrained code path =
and can just spot the bug by reading the code

It's almost certainly one or a small number of very specific RPCs that =
are triggering it -- maybe OpenBSD does an extra lookup, or stat, or =
something, on a name that may not exist anymore, or does it sooner than =
the other clients. Hard to say, other than to wave hands at the =
possibilities.

And it may well be we're looking at two bugs: Danny may see one bug, =
perhaps triggered by a race condition, but it may be different from the =
OpenBSD client-triggered bug (to be clear: it's definitely a FreeBSD =
bug, although we might only see it when an OpenBSD client is used =
because perhaps OpenBSD also has a bug or feature).

Robert=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F794BAF9-4ADC-48F8-ACC3-C441EF77A8F2>