From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 14:36:47 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8F7BFCE6 for ; Wed, 10 Dec 2014 14:36:47 +0000 (UTC) Received: from smtp.unix-experience.fr (195-154-176-227.rev.poneytelecom.eu [195.154.176.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C970AD0 for ; Wed, 10 Dec 2014 14:36:46 +0000 (UTC) Received: from smtp.unix-experience.fr (unknown [192.168.200.21]) by smtp.unix-experience.fr (Postfix) with ESMTP id 3FC8F16F5; Wed, 10 Dec 2014 14:36:43 +0000 (UTC) X-Virus-Scanned: scanned by unix-experience.fr Received: from smtp.unix-experience.fr ([192.168.200.21]) by smtp.unix-experience.fr (smtp.unix-experience.fr [192.168.200.21]) (amavisd-new, port 10024) with ESMTP id RhkjzAQkeLCP; Wed, 10 Dec 2014 14:36:39 +0000 (UTC) Received: from mail.unix-experience.fr (repo.unix-experience.fr [192.168.200.30]) by smtp.unix-experience.fr (Postfix) with ESMTPSA id B4CB016E9; Wed, 10 Dec 2014 14:36:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=unix-experience.fr; s=uxselect; t=1418222199; bh=/4lRSlBqVdTd3ChMR4nT3MYxcoYhJQlMfjGM1cNXMYE=; h=Date:From:Subject:To:Cc:In-Reply-To:References; b=Qr7MC2GTQxZTX6UUdb545Nyr26BpFRDeJlc+/urrUMgRkJiIJBiGTFdJeJA6z/JrB hwuRHfOJyLoB0x6kpcHU7fEJJ+ejSCqRcGc3EFG6+JBd8OHVAtr6bk8cRnwq2Kv/Mu ku1y6jfGZX5SwtQezRsF5rtZHSAk+byUlAp2g6PA= Mime-Version: 1.0 Date: Wed, 10 Dec 2014 14:36:39 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: X-Mailer: RainLoop/1.6.10.182 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" Subject: Re: High Kernel Load with nfsv4 To: "Rick Macklem" In-Reply-To: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca> References: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 14:36:47 -0000 Hi Rick,=0Athanks for your suggestion.=0AFor my locking bug, rpc.lockd is= stucked in rpcrecv state on the server. kill -9 doesn't affect the proce= ss, it's blocked.... (State: Ds)=0A=0A=0Afor the performances=0A=0ANFSv3:= 60Mbps=0ANFSv4: 45Mbps=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, N= etwork and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3= =A9cembre 2014 13:56 "Rick Macklem" a =C3=A9crit: = =0A> Loic Blot wrote:=0A> =0A>> Hi Rick,=0A>> I'm trying NFSv3.=0A>> Some= jails are starting very well but now i have an issue with lockd=0A>> aft= er some minutes:=0A>> =0A>> nfs server 10.10.X.8:/jails: lockd not respon= ding=0A>> nfs server 10.10.X.8:/jails lockd is alive again=0A>> =0A>> I l= ook at mbuf, but i seems there is no problem.=0A> =0A> Well, if you need = locks to be visible across multiple clients, then=0A> I'm afraid you are = stuck with using NFSv4 and the performance you get=0A> from it. (There is= no way to do file handle affinity for NFSv4 because=0A> the read and wri= te ops are buried in the compound RPC and not easily=0A> recognized.)=0A>= =0A> If the locks don't need to be visible across multiple clients, I'd= =0A> suggest trying the "nolockd" option with nfsv3.=0A> =0A>> Here is my= rc.conf on server:=0A>> =0A>> nfs_server_enable=3D"YES"=0A>> nfsv4_serve= r_enable=3D"YES"=0A>> nfsuserd_enable=3D"YES"=0A>> nfsd_server_flags=3D"-= u -t -n 256"=0A>> mountd_enable=3D"YES"=0A>> mountd_flags=3D"-r"=0A>> nfs= userd_flags=3D"-usertimeout 0 -force 20"=0A>> rpcbind_enable=3D"YES"=0A>>= rpc_lockd_enable=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Here i= s the client:=0A>> =0A>> nfsuserd_enable=3D"YES"=0A>> nfsuserd_flags=3D"-= usertimeout 0 -force 20"=0A>> nfscbd_enable=3D"YES"=0A>> rpc_lockd_enable= =3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Have you got an idea ?= =0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ= k and Security Engineer=0A>> http://www.unix-experience.fr=0A>> =0A>> 9 d= =C3=A9cembre 2014 04:31 "Rick Macklem" a =C3=A9cri= t:=0A>>> Loic Blot wrote:=0A>>> =0A>>>> Hi rick,=0A>>>> =0A>>>> I waited = 3 hours (no lag at jail launch) and now I do: sysrc=0A>>>> memcached_flag= s=3D"-v -m 512"=0A>>>> Command was very very slow...=0A>>>> =0A>>>> Here = is a dd over NFS:=0A>>>> =0A>>>> 601062912 bytes transferred in 21.060679= secs (28539579 bytes/sec)=0A>>> =0A>>> Can you try the same read using a= n NFSv3 mount?=0A>>> (If it runs much faster, you have probably been bitt= en by the ZFS=0A>>> "sequential vs random" read heuristic which I've been= told things=0A>>> NFS is doing "random" reads without file handle affini= ty. File=0A>>> handle affinity is very hard to do for NFSv4, so it isn't = done.)=0A>>> =0A> =0A> I was actually suggesting that you try the "dd" ov= er nfsv3 to see how=0A> the performance compared with nfsv4. If you do th= at, please post the=0A> comparable results.=0A> =0A> Someday I would like= to try and get ZFS's sequential vs random read=0A> heuristic modified an= d any info on what difference in performance that=0A> might make for NFS = would be useful.=0A> =0A> rick=0A> =0A>>> rick=0A>>> =0A>>>> This is quit= e slow...=0A>>>> =0A>>>> You can found some nfsstat below (command isn't = finished yet)=0A>>>> =0A>>>> nfsstat -c -w 1=0A>>>> =0A>>>> GtAttr Lookup= Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 = 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 17 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 = 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 = 0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 3 0=0A= >>>> 37 10 0 8 0 0 14 1=0A>>>> 18 16 0 4 1 2 4 0=0A>>>> 78 91 0 82 6 12 3= 0 0=0A>>>> 19 18 0 2 2 4 2 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0= 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 1 0 0 0= 0 1 0=0A>>>> 4 6 0 0 6 0 3 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0= 0=0A>>>> 1 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 1 0 0 0=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 0 0 0 0 0 0 0 0=0A>>>> 6 108 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 0 0 0 0 0 0 0 0=0A>>>> 98 54 0 86 11 0 25 0=0A>>>> 36 24 0 39 25 0 10 1= =0A>>>> 67 8 0 63 63 0 41 0=0A>>>> 34 0 0 35 34 0 0 0=0A>>>> 75 0 0 75 77= 0 0 0=0A>>>> 34 0 0 35 35 0 0 0=0A>>>> 75 0 0 74 76 0 0 0=0A>>>> 33 0 0 = 34 33 0 0 0=0A>>>> 0 0 0 0 5 0 0 0=0A>>>> 0 0 0 0 0 0 6 0=0A>>>> 11 0 0 0= 0 0 11 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 17 0 0 0 0 1 0=0A>>>> GtAttr Lo= okup Rdlink Read Write Rename Access Rddir=0A>>>> 4 5 0 0 0 0 12 0=0A>>>>= 2 0 0 0 0 0 26 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 = 0 0 0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 = 0 0 0=0A>>>> 4 0 0 0 0 0 2 0=0A>>>> 2 0 0 0 0 0 24 0=0A>>>> 0 0 0 0 0 0 0= 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 = 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 7 0=0A>>>> 2 1 0 0 0 0 1 = 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 6 0 0 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>>= 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 6= 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0= 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0= 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 = 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 71 0 0 0 0 0 0=0A>>>> 0 1 0 0 0= 0 0 0=0A>>>> 2 36 0 0 0 0 1 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 = 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 1 0 0 0 0 0 1 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 79 6 0 79 79 0 2 0= =0A>>>> 25 0 0 25 26 0 6 0=0A>>>> 43 18 0 39 46 0 23 0=0A>>>> 36 0 0 36 3= 6 0 31 0=0A>>>> 68 1 0 66 68 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write= Rename Access Rddir=0A>>>> 36 0 0 36 36 0 0 0=0A>>>> 48 0 0 48 49 0 0 0= =0A>>>> 20 0 0 20 20 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 3 14 0 1 0 0 11 = 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 4 22 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 23 0=0A>= >>> =0A>>>> Regards,=0A>>>> =0A>>>> Lo=C3=AFc Blot,=0A>>>> UNIX Systems, = Network and Security Engineer=0A>>>> http://www.unix-experience.fr=0A>>>>= =0A>>>> 8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a=0A>>>> =C3=A9crit:=0A>>>>> Hi Rick,=0A>>>>> I stopped the ja= ils this week-end and started it this morning,=0A>>>>> i'll=0A>>>>> give = you some stats this week.=0A>>>>> =0A>>>>> Here is my nfsstat -m output (= with your rsize/wsize tweaks)=0A>>>>> =0A>>>>> =0A>>>> =0A>>> =0A>> =0A> = nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregmin= =3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>>>> =0A>>>> =0A>>> =0A>> =0A>= etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead=3D= 1,wcommitsize=3D773136,timeout=3D120,retra=0A>>>>> s=3D2147483647=0A>>>>>= =0A>>>>> On server side my disks are on a raid controller which show a= =0A>>>>> 512b=0A>>>>> volume and write performances=0A>>>>> are very hone= st (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096=0A>>>>> count=3D10000= 0000 =3D> 450MBps)=0A>>>>> =0A>>>>> Regards,=0A>>>>> =0A>>>>> Lo=C3=AFc B= lot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A>>>>> http://w= ww.unix-experience.fr=0A>>>>> =0A>>>>> 5 d=C3=A9cembre 2014 15:14 "Rick M= acklem" a=0A>>>>> =C3=A9crit:=0A>>>>> =0A>>>>>> Lo= ic Blot wrote:=0A>>>>>> =0A>>>>>>> Hi,=0A>>>>>>> i'm trying to create a v= irtualisation environment based on=0A>>>>>>> jails.=0A>>>>>>> Those jails= are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>>>> which=0A>>>>>= >> export a NFSv4 volume. This NFSv4 volume was mounted on a big=0A>>>>>>= > hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1=0A>>>>>>= > was=0A>>>>>>> used at this time).=0A>>>>>>> =0A>>>>>>> The problem is s= imple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>>>> and=0A>>>>>>> 1= 0GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>>>> = fine at start but the system slows down and after 2-3 days=0A>>>>>>> beco= me=0A>>>>>>> unusable. When i look at top command i see 80-100% on system= =0A>>>>>>> and=0A>>>>>>> commands are very very slow. Many process are ta= gged with=0A>>>>>>> nfs_cl*.=0A>>>>>> =0A>>>>>> To be honest, I would exp= ect the slowness to be because of slow=0A>>>>>> response=0A>>>>>> from th= e NFSv4 server, but if you do:=0A>>>>>> # ps axHl=0A>>>>>> on a client wh= en it is slow and post that, it would give us some=0A>>>>>> more=0A>>>>>>= information on where the client side processes are sitting.=0A>>>>>> If = you also do something like:=0A>>>>>> # nfsstat -c -w 1=0A>>>>>> and let i= t run for a while, that should show you how many RPCs=0A>>>>>> are=0A>>>>= >> being done and which ones.=0A>>>>>> =0A>>>>>> # nfsstat -m=0A>>>>>> wi= ll show you what your mount is actually using.=0A>>>>>> The only mount op= tion I can suggest trying is=0A>>>>>> "rsize=3D32768,wsize=3D32768",=0A>>= >>>> since some network environments have difficulties with 64K.=0A>>>>>>= =0A>>>>>> There are a few things you can try on the NFSv4 server side, i= f=0A>>>>>> it=0A>>>>>> appears=0A>>>>>> that the clients are generating a= large RPC load.=0A>>>>>> - disabling the DRC cache for TCP by setting vf= s.nfsd.cachetcp=3D0=0A>>>>>> - If the server is seeing a large write RPC = load, then=0A>>>>>> "sync=3Ddisabled"=0A>>>>>> might help, although it do= es run a risk of data loss when the=0A>>>>>> server=0A>>>>>> crashes.=0A>= >>>>> Then there are a couple of other ZFS related things (I'm not a=0A>>= >>>> ZFS=0A>>>>>> guy,=0A>>>>>> but these have shown up on the mailing li= sts).=0A>>>>>> - make sure your volumes are 4K aligned and ashift=3D12 (i= n case a=0A>>>>>> drive=0A>>>>>> that uses 4K sectors is pretending to be= 512byte sectored)=0A>>>>>> - never run over 70-80% full if write perform= ance is an issue=0A>>>>>> - use a zil on an SSD with good write performan= ce=0A>>>>>> =0A>>>>>> The only NFSv4 thing I can tell you is that it is k= nown that=0A>>>>>> ZFS's=0A>>>>>> algorithm for determining sequential vs= random I/O fails for=0A>>>>>> NFSv4=0A>>>>>> during writing and this can= be a performance hit. The only=0A>>>>>> workaround=0A>>>>>> is to use NF= Sv3 mounts, since file handle affinity apparently=0A>>>>>> fixes=0A>>>>>>= the problem and this is only done for NFSv3.=0A>>>>>> =0A>>>>>> rick=0A>= >>>>> =0A>>>>>>> I saw that there are TSO issues with igb then i'm trying= to=0A>>>>>>> disable=0A>>>>>>> it with sysctl but the situation wasn't s= olved.=0A>>>>>>> =0A>>>>>>> Someone has got ideas ? I can give you more i= nformations if you=0A>>>>>>> need.=0A>>>>>>> =0A>>>>>>> Thanks in advance= .=0A>>>>>>> Regards,=0A>>>>>>> =0A>>>>>>> Lo=C3=AFc Blot,=0A>>>>>>> UNIX = Systems, Network and Security Engineer=0A>>>>>>> http://www.unix-experien= ce.fr=0A>>>>>>> _______________________________________________=0A>>>>>>>= freebsd-fs@freebsd.org mailing list=0A>>>>>>> http://lists.freebsd.org/m= ailman/listinfo/freebsd-fs=0A>>>>>>> To unsubscribe, send any mail to=0A>= >>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>>>> =0A>>>>> ____________= ___________________________________=0A>>>>> freebsd-fs@freebsd.org mailin= g list=0A>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>= >> To unsubscribe, send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freeb= sd.org"