Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Jan 2016 20:37:13 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Mikhail T." <mi+thun@aldan.algebra.com>
Cc:        Karli =?utf-8?Q?Sj=C3=B6berg?= <karli.sjoberg@slu.se>,  freebsd-fs@FreeBSD.org
Subject:   Re: NFS reads vs. writes
Message-ID:  <495055121.147587416.1451871433217.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <5688D3C1.90301@aldan.algebra.com>
References:  <8291bb85-bd01-4c8c-80f7-2adcf9947366@email.android.com> <5688D3C1.90301@aldan.algebra.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mikhail T. wrote:
> On 03.01.2016 02:16, Karli Sj=C3=B6berg wrote:
> >
> > The difference between "mount" and "mount -o async" should tell you if
> > you'd benefit from a separate log device in the pool.
> >
> This is not a ZFS problem. The same filesystem is being read in both
> cases. The same data is being read from and written to the same
> filesystems. For some reason, it is much faster to read via NFS than to
> write to it, however.
>=20
This issue isn't new. It showed up when Sun introduced NFS in 1985.
NFSv3 did change things a little, by allowing UNSTABLE writes.
Here's what an NFSv3 or NFSv4 client does when writing:
- Issues some # of UNSTABLE writes. The server need only have these is serv=
er
  RAM before replying NFS_OK.
- Then the client does a Commit. At this point the NFS server is required t=
o
  store all the data written in the above writes and related metadata on st=
able
  storage before replying NFS_OK.
  --> This is where the "sync" vs "async" is a big issue. If you use "sync=
=3Ddisabled"
      (I'm not a ZFS guy, but I think that is what the ZFS option looks lik=
es) you
      *break* the NFS protocol (ie. violate the RFC) and put your data at s=
ome risk,
      but you will typically get better (often much better) write performan=
ce.
      OR
      You put a ZIL on a dedicated device with fast write performance, so t=
he data
      can go there to satisfy the stable storage requirement. (I know nothi=
ng
      about them, but SSDs have dramatically different write performance, s=
o an SSD
      to be used for a ZIL must be carefully selected to ensure good write =
performance.)

How many writes are in "some #" is up to the client. For FreeBSD clients, t=
he "wcommitsize"
mount option can be used to adjust this. Recently the default tuning of thi=
s changed
significantly, but you didn't mention how recent your system(s) are, so man=
ual tuning of
it may be useful. (See "man mount_nfs" for more on this.)

Also, the NFS server was recently tweaked so that it could handle 128K rsiz=
e/wsize,
but the FreeBSD client is limited to MAXBSIZE and this has not been increas=
ed
beyond 64K. To do so, you have to change the value of this in the kernel so=
urces
and rebuild your kernel. (The problem is that increasing MAXBSIZE makes the=
 kernel
use more KVM for the buffer cache and if a system isn't doing significant c=
lient
side NFS, this is wasted.)
Someday, I should see if MAXBSIZE can be made a TUNABLE, but I haven't done=
 that.
--> As such, unless you use a Linux NFS client, the reads/writes will be 64=
K, whereas
    128K would work better for ZFS.

Some NAS hardware vendors solve this problem by using non-volatile RAM, but=
 that
isn't available in generic hardware.

> And finally, just to put the matter to rest, both ZFS-pools already have
> a separate zil-device (on an SSD).
>=20
If this SSD is dedicated to the ZIL and is one known to have good write per=
formance,
it should help, but in your case the SSD seems to be the bottleneck.

rick

>     -mi
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?495055121.147587416.1451871433217.JavaMail.zimbra>