From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 28 11:56:03 2006 Return-Path: X-Original-To: freebsd-scsi@FreeBSD.org Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F14EA16A420 for ; Tue, 28 Feb 2006 11:56:03 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id B2F9F43D49 for ; Tue, 28 Feb 2006 11:56:02 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (Postfix) with ESMTP id 69FB6328F92; Tue, 28 Feb 2006 22:56:01 +1100 (EST) Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id k1SBtwBb007002; Tue, 28 Feb 2006 22:55:59 +1100 Date: Tue, 28 Feb 2006 22:55:57 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Danny Braniss In-Reply-To: Message-ID: <20060228220252.B1770@epsplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-scsi@FreeBSD.org Subject: Re: Qlogic fibre channel support questions X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2006 11:56:04 -0000 On Tue, 28 Feb 2006, Danny Braniss wrote: >> On Mon, 27 Feb 2006, Matthew Jacob wrote: >> >>> Okay- let me ask why diskless booting doesn't work for you? >> >> Because NFS is slow. A locally disk (or a SAN attached disk, which is >> essentially the same to FreeBSD) is going to be faster than NFS, no matter what. > > don't be too hasty with conclusions :-) > 2- as to speed, it all depends, specially on how deep are your pockets. > i've been running several 'benchmarks' latetly and disk speed is not > everything. > > sample: > host is a Sun Fire X4200 (dual dula core Opteron) with SAS disks > OS is FreeBSD 6.1-PRERELEASE amd64. > > make buildworld: > diskless: 40m16.71s real 54m18.55s user 17m54.69s sys > (using only 1 server*) > nondiskless: 20m51.58s real 51m13.19s user 12m59.84s sys > " but /usr/obj is iSCSI: > 28m23.29s real 52m17.27s user 14m23.06s sys > " but /usr/src and /usr/obj is iSCSI: > 20m38.20s real 52m10.19s user 14m48.74s sys > diskless but /usr/src and /usr/obj is iSCSI: > 20m22.66s real 50m56.14s user 13m8.20s sys > > *: server in this case is a Xeon running in 64 mode but not very fast > ethernet - em0 at 1gb but at about 50% efficiency. > this server will 'make buildworld' in about 40 min. using the onboard > LSILogic v3 MegaRAID RAID0. I recently tried to use 1Gbps ethernet more (instead of 100Mbps) and hoped to get better makeworld performance, but actually got less. The problem seems to be just that nfs3 does too many attribute cache refreshes, so although all the data fits in the VMIO cache there is a lot of network activity, and 100Gbps ends up slower because my 1Gbps NICs have a slightly higher latency than my 100Mbps NICs. The 100Mbps ones are fxp's and have a ping latency of about 100uS, and the 1GBps ones are a bge and an sk and have a ping latency of 140uS. I think these latencies are lower than average, but they are too large for good makeworld-over-nfs performance. makeworld generates about 2000 (or is it 5000?) packets/second and waiting just 40uS longer for 2000 replies reduces performance by 8% or about 120 seconds of the total buildworld time. The faster NICs are better for bandwidth. I get a max of 40MB/S for read/write using tcp and about 25MB/S using udp. tcp is apparently faster because the latency is so bad that streaming in tcp reduces its effects significantly. However, using tcp for makeworld is a pessimization. All systems are 2-3GHz AthlonXPs with only 33MHz PCI buses running a 2 year old version of FreeBSD-current with local optimizations, with /usr (including /usr/src) nfs3-mounted and local object and root trees (initially empty). "world" is actually only about 95% of the world. 100Mbps: -------- 31532 maximum resident set size 2626 average shared memory size 1762 average unshared data size 128 average unshared stack size 15521898 page reclaims 14904 page faults 0 swaps 1932 block input operations <--- few of these since nfs bins and srcs 11822 block output operations <--- it's not disk-bound 1883576 messages sent 1883480 messages received 33448 signals received 2104163 voluntary context switches 472277 involuntary context switches 1GBps/tcp: ----------- 1930.89 real 1222.87 user 184.10 sys <--- way slower (real) 1GBps/udp: ----------- 1909.86 real 1225.25 user 181.22 sys mostly local disks (except /usr, not including /usr/src): --------------------------------------------------------- 1476.58 real 1224.70 user 161.30 sys <--- This is almost a properly configured system, with disks fast enough for real = user + sys + epsilon. 1GBps/udp + the best tuning/hacking I could find: nfs access timeout 2 -> 60 (probably wrong for general use) sk interrupt moderation 100 -> 10 (reduces latency) delete zapping of attribute cache on open in nfs (probably a bug for general use; a PR says that this should always be done for ro mounts) ---------------------------------------------------------------------------- 1630.86 real 1227.86 user 175.09 sys ... 1342791 messages sent <--- tuning seems to work mainly by reducing 1343111 messages received <--- these; they are still large 1GBps/udp + the best tuning I could find: nfs access timeout 2 -> 60 sk interrupt moderation 100 -> 10 no zapping of attribute cache on open in nfs -j4 ----------------------------------------------------------- 1599.74 real 1276.18 user 262.04 sys ... 1727832 messages sent 1726818 messages received -j is normally bad for UP systems, but here it helps by using cycles that would otherwise be idle. Bruce