Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Apr 2010 10:02:39 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        arch@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: (in)appropriate uses for MAXBSIZE
Message-ID:  <Pine.GSO.4.63.1004110946400.27203@muncher.cs.uoguelph.ca>
In-Reply-To: <20100411114405.L10562@delplex.bde.org>
References:  <4BBEE2DD.3090409@freebsd.org> <Pine.GSO.4.63.1004090941200.14439@muncher.cs.uoguelph.ca> <4BBF3C5A.7040009@freebsd.org> <20100411114405.L10562@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On Sun, 11 Apr 2010, Bruce Evans wrote:

>
> Er, the maximum size of buffers in the buffer cache is especially
> irrelevant for nfs.  It is almost irrelevant for physical disks because
> clustering normally increases the bulk transfer size to MAXPHYS.
> Clustering takes a lot of CPU but doesn't affect the transfer rate much
> unless there is not enough CPU.  It is even less relevant for network
> i/o since there is a sort of reverse-clustering -- the buffers get split
> up into tiny packets (normally 1500 bytes less some header bytes) at
> the hardware level.  Again a lot of CPU is involved doing the (reverse)
> clustering, and again this doesn't affect the transfer rate much.
> However, 1500 is so tiny that the reverse-clustering ratio of the i/o
> size relative to MAXBSIZE (65536/1500) is much smaller than the normal
> clustering ratio relative to MAXBSIZE (132768/65536) and the extra CPU
> is more significant for network i/o.  (These aren't the actual normal
> ratios, but ones the limits of the attainable ones by varying only the
> block sizes under the file system's control.)  However2, increasing the
> network i/o size can make little difference to this problem -- it can
> only increase the already-too-large reverse-clustering ratio, while
> possibly reducing other reverse-clustering ratios (the others are for
> assembling the nfs buffers from local file system buffers; the local
> file system buffers are normally disassembled from pbuf size (MAXPHYS)
> to file system size (normally 16K); then conversion to nfs buffers
> involves either a sort of clustering or reverse clustering depending
> on the relative sizes of the buffers).  There are more gains to be
> had from increasing the network i/o size.  tcp allows larger buffers
> at intermediate levels but they still get split up at the hardware
> level.  Only some networks allow jumbo frames.
>

I've done a simple experiment on Mac OS X 10, where I tried different
sizes for the read and write RPCs plus different amounts of
read-ahead/write-behind and found the I/O rate increased linearly,
up to the max allowed by Mac OS X (MAXBSIZE == 128K) without 
read-ahead/write-behind. Using read-ahead/write-behind the performance
didn't increase at all, until the RPC read/write size was reduced.
(Solaris10 is using 256K by default and allowing up to 1Mb for read/write
RPC size now, so they seem to think that large values work well?)

When you start using a WAN environment, large read/write RPCs really
help, from what I've seen, since that helps fill the TCP pipe
(bits * latency between client<->server).

I care much more about WAN performance than LAN performance w.r.t. this.

I am not sure what you were referring to w.r.t. clustering, but if you
meant that the NFS client can easily do an RPC with a larger I/O size
than the size of the buffer handed it by the buffer cache, I'd like to
hear how that's done? (If not, then a bigger buffer from the buffer
cache is what I need to do a larger I/O size in the RPC.)

Once NFS hands the TCP socket the large RPC, I figure it's up to the
networking to get it on/off the wire, etc. If you are arguing that that
is where there can be major gains, I'll believe you, but it's not my
area of expertise and there's lots of other FreeBSD folks to work on
that. I do believe that being able to do a large read/write RPC is
going to help performance, particularily in the WAN case.

>>> Using
>>> larger I/O sizes for NFS is a simpler way to increase bulk data transfer
>>> rate than more buffers and more agressive read-ahead/write-behind.
>
> I'm not sure about that.  Read-ahead and write-behind is already very
> aggressive but seems to be not working right.  I use some patches by
> Bjorn Groenwald (?) which make it work better for the old nfs implemenation
> (I haven't tried the experimental one).  The problems seem to be mainly
> timing ones.  vfs clustering makes the buffer sizes almost irrelevant for
> physical disks, but there are latency problems for the network i/o.
> The latency problems seem to be larger for reads than for writes.  I
> get best results by using the same size for network buffers as for local
> buffers (16K).  This avoids 1 layer of buffer size changing (see above)
> and using 16K-buffers avoids buffer kva fragmentation (see below).  I
> saw little difference from changing the user buffer size, except small
> buffers tend to work better and smallest (512-byte) buffers may have
> actually worked best, I think by reducing latencies.
>

See above. There is always going to be cases like use over a WAN where
latency is going to be large. That's when large I/O RPCs will win.

I suspect you are focusing on the high bandwidth/low latecy LAN, which
is not where I believe that large I/O sized RPCs will make much 
difference.

Hope this helps clarify what I am looking for, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1004110946400.27203>