From owner-freebsd-fs@FreeBSD.ORG Wed Apr 1 20:06:16 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8AC49112 for ; Wed, 1 Apr 2015 20:06:16 +0000 (UTC) Received: from mail.tezzaron.com (mail.tezzaron.com [50.206.41.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 519C11EB for ; Wed, 1 Apr 2015 20:06:16 +0000 (UTC) Received: from delaware.tezzaron.com ([10.252.50.1]) by mail.tezzaron.com (IceWarp 11.1.2.0 x64) with ASMTP (SSL) id 201504011503410510 for ; Wed, 01 Apr 2015 15:03:41 -0500 Message-ID: <551C4F1D.1000206@tezzaron.com> Date: Wed, 01 Apr 2015 15:03:41 -0500 From: Adam Guimont User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: NFSD high CPU usage Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2015 20:06:16 -0000 I have an issue where NFSD will max out the CPU (1200% in this case) when a client workstation runs out of memory while trying to write via NFS. What also happens is the TCP Recv-Q fills up and causes connection timeouts for any other client trying to use the NFS server. I can reproduce the issue by running stress on a low-end client workstation. Change into the NFS mounted directory and then use stress to write via NFS and exhaust the memory, example: stress --cpu 2 --io 4 --vm 20 --hdd 4 The client workstation will eventually run out of memory trying to write into the NFS directory, fill the TCP Recv-Q on the NFS server, and then NFSD will max out the CPU. The actual client workstations (~50) are not running stress when this happens, it's a mixture of EDA tools (simulation and verification). For what it's worth, this is how I've been monitoring the TCP buffer queues where "xx.xxx.xx.xxx" is the IP address of the NFS server: cmdwatch -n1 'netstat -an | grep -e "Proto" -e "tcp4" | grep -e "Proto" -e "xx.xxx.xx.xxx.2049"' I have tried several tuning recommendations but it has not solved the problem. Has anyone else experienced this and is anyone else able to reproduce it? --- NFS server specs: OS = FreeBSD 10.0-RELEASE CPU = E5-1650 v3 Memory = 96GB Disks = 24x ST6000NM0034 in 4x raidz2 HBA = LSI SAS 9300-8i NIC = Intel 10Gb X540-T2 --- /boot/loader.conf autoboot_delay="3" geom_mirror_load="YES" mpslsi3_load="YES" cc_htcp_load="YES" --- /etc/rc.conf hostname="***" ifconfig_ix0="inet *** netmask 255.255.248.0 -tso -vlanhwtso" defaultrouter="***" sshd_enable="YES" ntpd_enable="YES" zfs_enable="YES" sendmail_enable="NO" nfs_server_enable="YES" nfs_server_flags="-h *** -t -n 128" nfs_client_enable="YES" rpcbind_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" samba_enable="YES" atop_enable="YES" atop_interval="5" zabbix_agentd_enable="YES" --- /etc/sysctl.conf vfs.nfsd.server_min_nfsvers=3 vfs.nfsd.cachetcp=0 kern.ipc.maxsockbuf=16777216 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendspace=1048576 net.inet.tcp.recvspace=1048576 net.inet.tcp.sendbuf_inc=32768 net.inet.tcp.recvbuf_inc=65536 net.inet.tcp.keepidle=10000 net.inet.tcp.keepintvl=2500 net.inet.tcp.always_keepalive=1 net.inet.tcp.cc.algorithm=htcp net.inet.tcp.cc.htcp.adaptive_backoff=1 net.inet.tcp.cc.htcp.rtt_scaling=1 net.inet.tcp.sack.enable=0 kern.ipc.soacceptqueue=1024 net.inet.tcp.mssdflt=1460 net.inet.tcp.minmss=1300 net.inet.tcp.tso=0 --- Client workstations: OS = CentOS 6.6 x64 Mount options from `cat /proc/mounts` = rw,nosuid,noatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=***,mountvers=3,mountport=916,mountproto=udp,local_lock=none,addr=*** --- Regards, Adam Guimont