Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jan 2014 19:06:49 -0500
From:      J David <j.david.lists@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA@mail.gmail.com>
In-Reply-To: <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca>
References:  <20140128021450.GY13704@funkthat.com> <372707859.17587309.1390923341323.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Since messgaes are sent quickly and then mbufs released, except for
> the DRC in the server, I think avoiding large allocations for server
> replies that may be cached is the case to try and avoid. Fortunately
> the large replies will be for read and readdir and these don't need
> to be cached by the DRC. As such, a patch that uses 4K clusters in
> the server for read, readdir and 4K clusters for write requests in
> the client, should be appropriate, I think?

m_getm2 appears to consistent produce "right-sized" results.  The
relevant code is:

    while (len > 0) {

        if (len > MCLBYTES)

            mb = m_getjcl(how, type, (flags & M_PKTHDR),

                MJUMPAGESIZE);

        else if (len >= MINCLSIZE)

            mb = m_getcl(how, type, (flags & M_PKTHDR));

        else if (flags & M_PKTHDR)

            mb = m_gethdr(how, type);

        else

            mb = m_get(how, type);

/* ... */

    }

So it allocates the shortest possible chain and uses the best-fit
cluster for the last (or only) block in the chain.

It's probably the use of this function in m_uiotombuf or somewhere
very similar that prevents tools like iperf from encountering this
same issue.

Getting this same logic into the NFS code seems like it would be a
good thing, in terms of reducing code duplication, increasing
performance, and leveraging a well-tested code path.

It may raise portability concerns, but it does seem likely that other
OS's to which the NFS code could potentially be ported have similar
mechanisms these days.  Possibly it would be worthwhile to examine
whether the NFS code could choose a slightly different point of
abstraction.  Or, if that's undesirable, maybe asking the hypothetical
person doing such a port to cross that bridge when they come to it is
not unreasonable, since that would be the person most likely to be
intimately familiar with the relevant details of both OS's.

Also, looking at GAWollman's patch, an mbuf+cluster allocator that
kicks back a prewired iovec seems really handy.  Is that something
that would be useful elsewhere in the kernel, or is NFS just kind of a
special case because it's just moving data around, not across weird
boundaries like device drivers and anything user mode-facing does?

Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA>