Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Dec 2014 14:36:39 +0000
From:      "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr>
To:        "Rick Macklem" <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: High Kernel Load with nfsv4
Message-ID:  <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr>
In-Reply-To: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>
References:  <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,=0Athanks for your suggestion.=0AFor my locking bug, rpc.lockd is=
 stucked in rpcrecv state on the server. kill -9 doesn't affect the proce=
ss, it's blocked.... (State: Ds)=0A=0A=0Afor the performances=0A=0ANFSv3:=
 60Mbps=0ANFSv4: 45Mbps=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, N=
etwork and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3=
=A9cembre 2014 13:56 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit: =
=0A> Loic Blot wrote:=0A> =0A>> Hi Rick,=0A>> I'm trying NFSv3.=0A>> Some=
 jails are starting very well but now i have an issue with lockd=0A>> aft=
er some minutes:=0A>> =0A>> nfs server 10.10.X.8:/jails: lockd not respon=
ding=0A>> nfs server 10.10.X.8:/jails lockd is alive again=0A>> =0A>> I l=
ook at mbuf, but i seems there is no problem.=0A> =0A> Well, if you need =
locks to be visible across multiple clients, then=0A> I'm afraid you are =
stuck with using NFSv4 and the performance you get=0A> from it. (There is=
 no way to do file handle affinity for NFSv4 because=0A> the read and wri=
te ops are buried in the compound RPC and not easily=0A> recognized.)=0A>=
 =0A> If the locks don't need to be visible across multiple clients, I'd=
=0A> suggest trying the "nolockd" option with nfsv3.=0A> =0A>> Here is my=
 rc.conf on server:=0A>> =0A>> nfs_server_enable=3D"YES"=0A>> nfsv4_serve=
r_enable=3D"YES"=0A>> nfsuserd_enable=3D"YES"=0A>> nfsd_server_flags=3D"-=
u -t -n 256"=0A>> mountd_enable=3D"YES"=0A>> mountd_flags=3D"-r"=0A>> nfs=
userd_flags=3D"-usertimeout 0 -force 20"=0A>> rpcbind_enable=3D"YES"=0A>>=
 rpc_lockd_enable=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Here i=
s the client:=0A>> =0A>> nfsuserd_enable=3D"YES"=0A>> nfsuserd_flags=3D"-=
usertimeout 0 -force 20"=0A>> nfscbd_enable=3D"YES"=0A>> rpc_lockd_enable=
=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Have you got an idea ?=
=0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ=
k and Security Engineer=0A>> http://www.unix-experience.fr=0A>>; =0A>> 9 d=
=C3=A9cembre 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9cri=
t:=0A>>> Loic Blot wrote:=0A>>> =0A>>>> Hi rick,=0A>>>> =0A>>>> I waited =
3 hours (no lag at jail launch) and now I do: sysrc=0A>>>> memcached_flag=
s=3D"-v -m 512"=0A>>>> Command was very very slow...=0A>>>> =0A>>>> Here =
is a dd over NFS:=0A>>>> =0A>>>> 601062912 bytes transferred in 21.060679=
 secs (28539579 bytes/sec)=0A>>> =0A>>> Can you try the same read using a=
n NFSv3 mount?=0A>>> (If it runs much faster, you have probably been bitt=
en by the ZFS=0A>>> "sequential vs random" read heuristic which I've been=
 told things=0A>>> NFS is doing "random" reads without file handle affini=
ty. File=0A>>> handle affinity is very hard to do for NFSv4, so it isn't =
done.)=0A>>> =0A> =0A> I was actually suggesting that you try the "dd" ov=
er nfsv3 to see how=0A> the performance compared with nfsv4. If you do th=
at, please post the=0A> comparable results.=0A> =0A> Someday I would like=
 to try and get ZFS's sequential vs random read=0A> heuristic modified an=
d any info on what difference in performance that=0A> might make for NFS =
would be useful.=0A> =0A> rick=0A> =0A>>> rick=0A>>> =0A>>>> This is quit=
e slow...=0A>>>> =0A>>>> You can found some nfsstat below (command isn't =
finished yet)=0A>>>> =0A>>>> nfsstat -c -w 1=0A>>>> =0A>>>> GtAttr Lookup=
 Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 =
0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 17 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 =
0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 =
0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 =
0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 3 0=0A=
>>>> 37 10 0 8 0 0 14 1=0A>>>> 18 16 0 4 1 2 4 0=0A>>>> 78 91 0 82 6 12 3=
0 0=0A>>>> 19 18 0 2 2 4 2 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 =
0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0=
 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 1 0 0 0=
 0 1 0=0A>>>> 4 6 0 0 6 0 3 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0=
 0=0A>>>> 1 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 1 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>=
>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> =
0 0 0 0 0 0 0 0=0A>>>> 6 108 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 =
0 0 0 0 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>=
>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> =
0 0 0 0 0 0 0 0=0A>>>> 98 54 0 86 11 0 25 0=0A>>>> 36 24 0 39 25 0 10 1=
=0A>>>> 67 8 0 63 63 0 41 0=0A>>>> 34 0 0 35 34 0 0 0=0A>>>> 75 0 0 75 77=
 0 0 0=0A>>>> 34 0 0 35 35 0 0 0=0A>>>> 75 0 0 74 76 0 0 0=0A>>>> 33 0 0 =
34 33 0 0 0=0A>>>> 0 0 0 0 5 0 0 0=0A>>>> 0 0 0 0 0 0 6 0=0A>>>> 11 0 0 0=
 0 0 11 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 17 0 0 0 0 1 0=0A>>>> GtAttr Lo=
okup Rdlink Read Write Rename Access Rddir=0A>>>> 4 5 0 0 0 0 12 0=0A>>>>=
 2 0 0 0 0 0 26 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 =
0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 =
0 0 0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 =
0 0 0=0A>>>> 4 0 0 0 0 0 2 0=0A>>>> 2 0 0 0 0 0 24 0=0A>>>> 0 0 0 0 0 0 0=
 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>=
>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 =
0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 7 0=0A>>>> 2 1 0 0 0 0 1 =
0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 6 0 0 0=0A=
>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>>=
 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 6=
 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0=
 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0=
 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 =
0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 =
0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 71 0 0 0 0 0 0=0A>>>> 0 1 0 0 0=
 0 0 0=0A>>>> 2 36 0 0 0 0 1 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 =
0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 1 0 0 0 0 0 1 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 79 6 0 79 79 0 2 0=
=0A>>>> 25 0 0 25 26 0 6 0=0A>>>> 43 18 0 39 46 0 23 0=0A>>>> 36 0 0 36 3=
6 0 31 0=0A>>>> 68 1 0 66 68 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write=
 Rename Access Rddir=0A>>>> 36 0 0 36 36 0 0 0=0A>>>> 48 0 0 48 49 0 0 0=
=0A>>>> 20 0 0 20 20 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 3 14 0 1 0 0 11 =
0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A=
>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 22 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 23 0=0A>=
>>> =0A>>>> Regards,=0A>>>> =0A>>>> Lo=C3=AFc Blot,=0A>>>> UNIX Systems, =
Network and Security Engineer=0A>>>> http://www.unix-experience.fr=0A>>>>=
 =0A>>>> 8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-expe=
rience.fr> a=0A>>>> =C3=A9crit:=0A>>>>> Hi Rick,=0A>>>>> I stopped the ja=
ils this week-end and started it this morning,=0A>>>>> i'll=0A>>>>> give =
you some stats this week.=0A>>>>> =0A>>>>> Here is my nfsstat -m output (=
with your rsize/wsize tweaks)=0A>>>>> =0A>>>>> =0A>>>> =0A>>> =0A>> =0A> =
nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregmin=
=3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>>>> =0A>>>> =0A>>> =0A>> =0A>=
 etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead=3D=
1,wcommitsize=3D773136,timeout=3D120,retra=0A>>>>> s=3D2147483647=0A>>>>>=
 =0A>>>>> On server side my disks are on a raid controller which show a=
=0A>>>>> 512b=0A>>>>> volume and write performances=0A>>>>> are very hone=
st (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096=0A>>>>> count=3D10000=
0000 =3D> 450MBps)=0A>>>>> =0A>>>>> Regards,=0A>>>>> =0A>>>>> Lo=C3=AFc B=
lot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A>>>>> http://w=
ww.unix-experience.fr=0A>>>>> =0A>>>>> 5 d=C3=A9cembre 2014 15:14 "Rick M=
acklem" <rmacklem@uoguelph.ca> a=0A>>>>> =C3=A9crit:=0A>>>>> =0A>>>>>> Lo=
ic Blot wrote:=0A>>>>>> =0A>>>>>>> Hi,=0A>>>>>>> i'm trying to create a v=
irtualisation environment based on=0A>>>>>>> jails.=0A>>>>>>> Those jails=
 are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>>>> which=0A>>>>>=
>> export a NFSv4 volume. This NFSv4 volume was mounted on a big=0A>>>>>>=
> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1=0A>>>>>>=
> was=0A>>>>>>> used at this time).=0A>>>>>>> =0A>>>>>>> The problem is s=
imple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>>>> and=0A>>>>>>> 1=
0GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>>>> =
fine at start but the system slows down and after 2-3 days=0A>>>>>>> beco=
me=0A>>>>>>> unusable. When i look at top command i see 80-100% on system=
=0A>>>>>>> and=0A>>>>>>> commands are very very slow. Many process are ta=
gged with=0A>>>>>>> nfs_cl*.=0A>>>>>> =0A>>>>>> To be honest, I would exp=
ect the slowness to be because of slow=0A>>>>>> response=0A>>>>>> from th=
e NFSv4 server, but if you do:=0A>>>>>> # ps axHl=0A>>>>>> on a client wh=
en it is slow and post that, it would give us some=0A>>>>>> more=0A>>>>>>=
 information on where the client side processes are sitting.=0A>>>>>> If =
you also do something like:=0A>>>>>> # nfsstat -c -w 1=0A>>>>>> and let i=
t run for a while, that should show you how many RPCs=0A>>>>>> are=0A>>>>=
>> being done and which ones.=0A>>>>>> =0A>>>>>> # nfsstat -m=0A>>>>>> wi=
ll show you what your mount is actually using.=0A>>>>>> The only mount op=
tion I can suggest trying is=0A>>>>>> "rsize=3D32768,wsize=3D32768",=0A>>=
>>>> since some network environments have difficulties with 64K.=0A>>>>>>=
 =0A>>>>>> There are a few things you can try on the NFSv4 server side, i=
f=0A>>>>>> it=0A>>>>>> appears=0A>>>>>> that the clients are generating a=
 large RPC load.=0A>>>>>> - disabling the DRC cache for TCP by setting vf=
s.nfsd.cachetcp=3D0=0A>>>>>> - If the server is seeing a large write RPC =
load, then=0A>>>>>> "sync=3Ddisabled"=0A>>>>>> might help, although it do=
es run a risk of data loss when the=0A>>>>>> server=0A>>>>>> crashes.=0A>=
>>>>> Then there are a couple of other ZFS related things (I'm not a=0A>>=
>>>> ZFS=0A>>>>>> guy,=0A>>>>>> but these have shown up on the mailing li=
sts).=0A>>>>>> - make sure your volumes are 4K aligned and ashift=3D12 (i=
n case a=0A>>>>>> drive=0A>>>>>> that uses 4K sectors is pretending to be=
 512byte sectored)=0A>>>>>> - never run over 70-80% full if write perform=
ance is an issue=0A>>>>>> - use a zil on an SSD with good write performan=
ce=0A>>>>>> =0A>>>>>> The only NFSv4 thing I can tell you is that it is k=
nown that=0A>>>>>> ZFS's=0A>>>>>> algorithm for determining sequential vs=
 random I/O fails for=0A>>>>>> NFSv4=0A>>>>>> during writing and this can=
 be a performance hit. The only=0A>>>>>> workaround=0A>>>>>> is to use NF=
Sv3 mounts, since file handle affinity apparently=0A>>>>>> fixes=0A>>>>>>=
 the problem and this is only done for NFSv3.=0A>>>>>> =0A>>>>>> rick=0A>=
>>>>> =0A>>>>>>> I saw that there are TSO issues with igb then i'm trying=
 to=0A>>>>>>> disable=0A>>>>>>> it with sysctl but the situation wasn't s=
olved.=0A>>>>>>> =0A>>>>>>> Someone has got ideas ? I can give you more i=
nformations if you=0A>>>>>>> need.=0A>>>>>>> =0A>>>>>>> Thanks in advance=
.=0A>>>>>>> Regards,=0A>>>>>>> =0A>>>>>>> Lo=C3=AFc Blot,=0A>>>>>>> UNIX =
Systems, Network and Security Engineer=0A>>>>>>> http://www.unix-experien=
ce.fr=0A>>>>>>> _______________________________________________=0A>>>>>>>=
 freebsd-fs@freebsd.org mailing list=0A>>>>>>> http://lists.freebsd.org/m=
ailman/listinfo/freebsd-fs=0A>>>>>>> To unsubscribe, send any mail to=0A>=
>>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>>>> =0A>>>>> ____________=
___________________________________=0A>>>>> freebsd-fs@freebsd.org mailin=
g list=0A>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>=
>> To unsubscribe, send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freeb=
sd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fc9e829cf79a03cd72f21226d276eb78>