From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 00:36:08 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DAE7F16A4CE for ; Fri, 18 Feb 2005 00:36:08 +0000 (GMT) Received: from corp.globat.com (corp.globat.com [216.193.201.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id A161743D4C for ; Fri, 18 Feb 2005 00:36:08 +0000 (GMT) (envelope-from drice@globat.com) Received: from globat.com (globat [66.159.202.156]) by corp.globat.com (8.12.11/8.12.9) with ESMTP id j1I0a7ee007743 for ; Thu, 17 Feb 2005 16:36:07 -0800 (PST) (envelope-from drice@globat.com) From: David Rice Organization: Globat To: freebsd-performance@freebsd.org Date: Thu, 17 Feb 2005 16:36:10 -0800 User-Agent: KMail/1.5.4 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200502171636.10361.drice@globat.com> Subject: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 00:36:09 -0000 We are a web hosting company that runs exclusivly on FreeBSD. We are having storage availability and performance problems. All of our storage is exported via NFS to the client machines Any suggestions or advice will be greatly appreciated. We are willing to pay someone on a consulting basis to help us solve these problems. Please email me off list if you are a consultant. On the file server side we have: Dell PowerEdge 1750's Dell Perc4 RAID controller Dell Power Vault 220 storage shelf. (12) 146GB SCSI drives on one SCSI Bus with a hot spare (~1.3TB file system) GigaBit ethernet to the NFS client machines 1GB RAM (2) 2.4 GHZ Xenon Processors FreeBSD 5.2.1 or FreBSD 5.3 On the client side we have: Dell PowerEdge 1750's 1GB RAM (2) 2.4 GHZ Xenon Processors GigaBit ethernet 36GB SCSI root disk FreeBSD 4.9 to FreeBSD 4.11 On the network side we have: Gigabit ethernet Foundry BigIron and NetIron swithes Cisco 6509 with Gigabit switch blades Typicly we have 7 client boxes mounting storage from a single file server. Each client box servers 1000 web sites and associate email. We have done the basic NFS tuning (ie: Read write size optimization and kernel tuning) The problems we are having is as follows. 1. Slow perfomance during peek traffic periods 2. Client boxes have high load averages and sometimes crashes due to slow NFS performance. 3. File servers that randomly crash with "Fatal trap 12: page fault while in kernel mode" 4. With soft updates enabled during FSCK the fileserver will freeze with all NFS processs in the "snaplck" state. We disabled soft updates because of this. I can provide and other details about our configuration if needed. David Rice drice@globat.com Thank You. From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 08:26:15 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 59CEB16A4CE for ; Fri, 18 Feb 2005 08:26:15 +0000 (GMT) Received: from web26801.mail.ukl.yahoo.com (web26801.mail.ukl.yahoo.com [217.146.176.77]) by mx1.FreeBSD.org (Postfix) with SMTP id 587FC43D53 for ; Fri, 18 Feb 2005 08:26:14 +0000 (GMT) (envelope-from cguttesen@yahoo.dk) Received: (qmail 59494 invoked by uid 60001); 18 Feb 2005 08:26:13 -0000 Message-ID: <20050218082613.59492.qmail@web26801.mail.ukl.yahoo.com> Received: from [194.248.174.58] by web26801.mail.ukl.yahoo.com via HTTP; Fri, 18 Feb 2005 09:26:13 CET Date: Fri, 18 Feb 2005 09:26:13 +0100 (CET) From: Claus Guttesen To: David Rice , freebsd-performance@freebsd.org In-Reply-To: <200502171636.10361.drice@globat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 08:26:15 -0000 > We are a web hosting company that runs exclusivly > on FreeBSD. We are having > storage availability and performance problems. All > of our storage is exported > via NFS to the client machines Any suggestions or > advice will be greatly appreciated. > > On the file server side we have: > > Dell PowerEdge 1750's > Dell Perc4 RAID controller > Dell Power Vault 220 storage shelf. > (12) 146GB SCSI drives on one SCSI Bus with a hot > spare (~1.3TB file system) > GigaBit ethernet to the NFS client machines > 1GB RAM > (2) 2.4 GHZ Xenon Processors > FreeBSD 5.2.1 or FreBSD 5.3 > > On the client side we have: > > Dell PowerEdge 1750's > 1GB RAM > (2) 2.4 GHZ Xenon Processors > GigaBit ethernet > 36GB SCSI root disk > FreeBSD 4.9 to FreeBSD 4.11 > > On the network side we have: > Gigabit ethernet > Foundry BigIron and NetIron swithes > Cisco 6509 with Gigabit switch blades > We have done the basic NFS tuning > (ie: Read write size > optimization and kernel tuning) You could try to cvsup to the latest RELENG_5 on client and server. Are you using udp? Try switching to tcp if not, may not apply to 4.x. I have nine webservers which nfs-mounts some TB of files, not incredible fast but works. All webservers are RELENG_5 and nfs-servers are 5.2 current from Feb. 18'th 2004, 5.3 beta 3 and 5.3 from Dec. 8'th. I will upgrade all nfs-servers when RELENG_5_4 is released. Claus From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 10:59:43 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5CA2B16A4CE for ; Fri, 18 Feb 2005 10:59:43 +0000 (GMT) Received: from postfix4-2.free.fr (postfix4-2.free.fr [213.228.0.176]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0537043D49 for ; Fri, 18 Feb 2005 10:59:43 +0000 (GMT) (envelope-from tataz@tataz.chchile.org) Received: from tatooine.tataz.chchile.org (vol75-8-82-233-239-98.fbx.proxad.net [82.233.239.98]) by postfix4-2.free.fr (Postfix) with ESMTP id EF9732BECC5; Fri, 18 Feb 2005 11:59:41 +0100 (CET) Received: by tatooine.tataz.chchile.org (Postfix, from userid 1000) id 2D42A407C; Fri, 18 Feb 2005 11:59:08 +0100 (CET) Date: Fri, 18 Feb 2005 11:59:08 +0100 From: Jeremie Le Hen To: Claus Guttesen Message-ID: <20050218105908.GV82324@obiwan.tataz.chchile.org> References: <200502171636.10361.drice@globat.com> <20050218082613.59492.qmail@web26801.mail.ukl.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050218082613.59492.qmail@web26801.mail.ukl.yahoo.com> User-Agent: Mutt/1.5.7i cc: David Rice cc: freebsd-performance@freebsd.org Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 10:59:43 -0000 > You could try to cvsup to the latest RELENG_5 on > client and server. Are you using udp? Try switching to > tcp if not, may not apply to 4.x. AFAIK, RELENG_4 has a very robust NFS implementation. I believe that switching the client from 4.x to 5.x is a waste of time in the idea of improving NFS performance, but maybe I'm wrong. Regards, -- Jeremie Le Hen jeremie at le-hen dot org From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 11:07:16 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 278BC16A4CE for ; Fri, 18 Feb 2005 11:07:16 +0000 (GMT) Received: from rproxy.gmail.com (rproxy.gmail.com [64.233.170.200]) by mx1.FreeBSD.org (Postfix) with ESMTP id A718243D2D for ; Fri, 18 Feb 2005 11:07:15 +0000 (GMT) (envelope-from aaron.glenn@gmail.com) Received: by rproxy.gmail.com with SMTP id a41so480562rng for ; Fri, 18 Feb 2005 03:07:15 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=EBD3XDM6LGsA8MNYBtAV+UcsafSvjZUcZH3yxCNtLqslkRnL4UlQe6wOYotjmxA2n/iqsvW9tol5sU9uP8SYnKHnQyt8ok9G7ouLUCpzmioH2xLSI/VW9fguod8NXNDBuVIOrFZxdgbU+5PC4FlWsziGn5+SDa2MwyNVEu+tSuw= Received: by 10.38.10.25 with SMTP id 25mr248457rnj; Fri, 18 Feb 2005 03:07:15 -0800 (PST) Received: by 10.38.151.77 with HTTP; Fri, 18 Feb 2005 03:07:15 -0800 (PST) Message-ID: <18f60194050218030737b807bd@mail.gmail.com> Date: Fri, 18 Feb 2005 03:07:15 -0800 From: Aaron Glenn To: David Rice In-Reply-To: <200502171636.10361.drice@globat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <200502171636.10361.drice@globat.com> cc: freebsd-performance@freebsd.org Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Aaron Glenn List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 11:07:16 -0000 On Thu, 17 Feb 2005 16:36:10 -0800, David Rice wrote: > The problems we are having is as follows. > > 1. Slow perfomance during peek traffic periods > 2. Client boxes have high load averages and sometimes crashes due to slow NFS > performance. > 3. File servers that randomly crash with "Fatal trap 12: page fault while in > kernel mode" > 4. With soft updates enabled during FSCK the fileserver will freeze with all > NFS processs in the "snaplck" state. We disabled soft updates because of > this. > > I can provide and other details about our configuration if needed. > > David Rice > drice@globat.com > > Thank You Just how many MB/s are you pushing during peak periods? How are the file servers connected to the clients? What have you looked it with vmstat? Why did you move to 5.x on your file servers? Are you tracking -STABLE on those, or just -RELEASE? By the way they're Xeon's - not Xenon's (-: aaron.glenn From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 15:03:50 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F0DB16A4D1 for ; Fri, 18 Feb 2005 15:03:50 +0000 (GMT) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5338C43D5A for ; Fri, 18 Feb 2005 15:03:49 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id j1IF3m5c081260; Fri, 18 Feb 2005 09:03:48 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <421603D0.3000403@centtech.com> Date: Fri, 18 Feb 2005 09:03:44 -0600 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050210 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Rice References: <200502171636.10361.drice@globat.com> In-Reply-To: <200502171636.10361.drice@globat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.82/708/Thu Feb 17 16:37:03 2005 on mh2.centtech.com X-Virus-Status: Clean cc: freebsd-performance@freebsd.org Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 15:03:50 -0000 David Rice wrote: > We are a web hosting company that runs exclusivly on FreeBSD. We are having > storage availability and performance problems. All of our storage is exported > via NFS to the client machines Any suggestions or advice will be greatly > appreciated. We are willing to pay someone on a consulting basis to help us > solve these problems. Please email me off list if you are a consultant. I have lots of heavily abused NFS servers running FreeBSD (I usually have 400-500 3.XGHz P4's hammering each NFS server), so maybe I can help a bit. > On the file server side we have: > > Dell PowerEdge 1750's > Dell Perc4 RAID controller > Dell Power Vault 220 storage shelf. > (12) 146GB SCSI drives on one SCSI Bus with a hot spare (~1.3TB file system) > GigaBit ethernet to the NFS client machines > 1GB RAM > (2) 2.4 GHZ Xenon Processors > FreeBSD 5.2.1 or FreBSD 5.3 If you are on the 5.x branch, make sure you are running -STABLE (cvsup and buildworld if you can). Minimum 5.3-RELEASE. > On the client side we have: > > Dell PowerEdge 1750's > 1GB RAM > (2) 2.4 GHZ Xenon Processors > GigaBit ethernet > 36GB SCSI root disk > FreeBSD 4.9 to FreeBSD 4.11 > > On the network side we have: > Gigabit ethernet > Foundry BigIron and NetIron swithes > Cisco 6509 with Gigabit switch blades > > > Typicly we have 7 client boxes mounting storage from a single file server. > Each client box servers 1000 web sites and associate email. We have > done the basic NFS tuning (ie: Read write size optimization and kernel tuning) > > > The problems we are having is as follows. > > 1. Slow perfomance during peek traffic periods > 2. Client boxes have high load averages and sometimes crashes due to slow NFS > performance. > 3. File servers that randomly crash with "Fatal trap 12: page fault while in > kernel mode" > 4. With soft updates enabled during FSCK the fileserver will freeze with all > NFS processs in the "snaplck" state. We disabled soft updates because of > this. > > I can provide and other details about our configuration if needed. During peak performance, from the clients, how do you 'feel' the slowness? Does an 'ls' in the directory over NFS stall? How about an ls -al? What kind of authentication mechanism are you using? How many nfsd processes do you have running on the server? During peak performance, how many of those nfsd processes are in a state other than 'nfsd' (use top, ps, etc)? Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology I have seen the future and it is just like the present, only longer. ------------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 15:49:11 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7644516A4CE for ; Fri, 18 Feb 2005 15:49:11 +0000 (GMT) Received: from mxsf06.cluster1.charter.net (mxsf06.cluster1.charter.net [209.225.28.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id BAB8543D6A for ; Fri, 18 Feb 2005 15:49:10 +0000 (GMT) (envelope-from pathiaki@pathiaki.com) Received: from mxip13.cluster1.charter.net (mxip13a.cluster1.charter.net [209.225.28.143])j1IFn67C009979 for ; Fri, 18 Feb 2005 10:49:07 -0500 Received: from cpe-66-189-12-20.ma.charter.com (HELO pc4.atlantisservices.com) (66.189.12.20) by mxip13.cluster1.charter.net with ESMTP; 18 Feb 2005 10:49:06 -0500 X-Ironport-AV: i="3.90,99,1107752400"; d="scan'208"; a="782781852:sNHT814024082" From: "Paul J. Pathiakis" To: freebsd-performance@freebsd.org Date: Fri, 18 Feb 2005 10:49:04 -0500 User-Agent: KMail/1.7.2 References: <200502171636.10361.drice@globat.com> In-Reply-To: <200502171636.10361.drice@globat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200502181049.04999.pathiaki@pathiaki.com> cc: David Rice Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 15:49:11 -0000 On Thursday 17 February 2005 19:36, David Rice wrote: > We are a web hosting company that runs exclusivly on FreeBSD. We are > having storage availability and performance problems. All of our storage is > exported via NFS to the client machines Any suggestions or advice will be > greatly appreciated. We are willing to pay someone on a consulting basis to > help us solve these problems. Please email me off list if you are a > consultant. > Hi David, First off, I'm trying to understand your configuration. Do you have all your client's content on your server(s) and you export it to the client web server boxen? This is my assumption. Check on your nfsstat's to see what's going on with the clients and servers. I'm a big fan of proper planning out storage on systems and keeping things flying at the local disk speed. My use of NFS is for administration and user accounts in networked environment, not in something like a web farm. You may want to consider a paradigm shift. Options: 1) I'm not sure of the current state of this, but I know all the storage vendors are moving in this direction... iSCSI or rSCSI. I've seen some efforts and momentum on this in the past. It's basically SAN on cheap hardware. It runs via encapsulating SCSI in IP and runs over a simple network of switches and cat5 along with iSCSI cards that can be inserted into large disk repository servers. 2) I used to work at one of the largest web farms, Genuity/BBNPlanet. The simplest thing was to work was on Jumpstart to create cookie cutter boxen. Mirror hard drives for reliability, connect lots of disks, with redundancy, cluster. (This can all be done with FreeBSD right now. Check into PXE booting, gmirror <-this rocks, build large machines with redundant controllers, cluster using pf/altq/CARP) Use iSCSI if you need to, but, more than likely, if you plan intelligently, you can make great use of local storage and virtual hosts with Apache. As you grow, plug in more disks for customers or sharing. Backup the whole thing with a centralized backup system using Bacula (configure Bacula as part of jumpstart so you don't have to screw with it too much) 3) Check the performance numbers and tunings of NFS on various OS, and tune the heck out of it. There's tons of numbers out there, but I don't have them offhand. Also, with the advent of the new ATA code (if you're using ATA's/SATA for storage, SCSI would be better), the new vfs code that is now Giant Free, 5.4 is going to be some serious kickass OS on the underlying IO before getting to the NFS. (If your data is read-only, I believe that helps a lot by exporting it read-only. Of course, I'm not sure how out of date my info is, I'm building a startup right now.) Just trying to help, Paul Pathiakis From owner-freebsd-performance@FreeBSD.ORG Fri Feb 18 16:22:31 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C1C6A16A4CE for ; Fri, 18 Feb 2005 16:22:31 +0000 (GMT) Received: from ox.eicat.ca (ox.eicat.ca [66.96.30.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6231843D45 for ; Fri, 18 Feb 2005 16:22:31 +0000 (GMT) (envelope-from dgilbert@daveg.ca) Received: by ox.eicat.ca (Postfix, from userid 66) id 818D3EA1B; Fri, 18 Feb 2005 11:22:30 -0500 (EST) Received: by canoe.dclg.ca (Postfix, from userid 101) id 6B5256389; Fri, 18 Feb 2005 11:22:24 -0500 (EST) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16918.5696.196112.640502@canoe.dclg.ca> Date: Fri, 18 Feb 2005 11:22:24 -0500 To: David Rice In-Reply-To: <200502171636.10361.drice@globat.com> References: <200502171636.10361.drice@globat.com> X-Mailer: VM 7.17 under 21.4 (patch 16) "Corporate Culture" XEmacs Lucid cc: freebsd-performance@freebsd.org Subject: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 16:22:31 -0000 >>>>> "David" == David Rice writes: David> The problems we are having is as follows. David> 1. Slow perfomance during peek traffic periods This is due largely to the cache on your RAID hardware card. Unfortunately, this is a failure mode of hardware RAID cards you can't avoid (only delay by buying more disk). David> 2. Client boxes David> have high load averages and sometimes crashes due to slow NFS Clients waiting for nfs requests are still considered "running" David> performance. 3. File servers that randomly crash with "Fatal David> trap 12: page fault while in kernel mode" 4. With soft updates David> enabled during FSCK the fileserver will freeze with all NFS David> processs in the "snaplck" state. We disabled soft updates David> because of this. The remainder of this sounds like memory corruption. Dave. -- ============================================================================ |David Gilbert, Independent Contractor. | Two things can only be | |Mail: dave@daveg.ca | equal if and only if they | |http://daveg.ca | are precisely opposite. | =========================================================GLO================ From owner-freebsd-performance@FreeBSD.ORG Sat Feb 19 12:25:19 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9936A16A4D0 for ; Sat, 19 Feb 2005 12:25:19 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56CCE43D39 for ; Sat, 19 Feb 2005 12:25:19 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id F15FD46B04; Sat, 19 Feb 2005 07:25:18 -0500 (EST) Date: Sat, 19 Feb 2005 12:23:48 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: David Rice In-Reply-To: <200502171636.10361.drice@globat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-performance@freebsd.org Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2005 12:25:19 -0000 On Thu, 17 Feb 2005, David Rice wrote: > Typicly we have 7 client boxes mounting storage from a single file > server. Each client box servers 1000 web sites and associate email. We > have done the basic NFS tuning (ie: Read write size optimization and > kernel tuning) How many nfsd's are you running with? If you run systat -vmstat 1 on your server under high load, could you send us the output? In particular, I'm interested in knowing how the system is spending its time, the paging level, I/O throughput on devices, and the systat -vmstat summary screen provides a good summary of this and more. A few snapshots of "gstat" output would also be very helpful. As would a snapshot or two of "top -S" output. This will give us a picture of how the system is spending its time. > 2. Client boxes have high load averages and sometimes crashes due to > slow NFS performance. Could you be more specific about the crash failure mode? > 3. File servers that randomly crash with "Fatal trap 12: page fault > while in kernel mode" Could you make sure you're running with at least the latest 5.3 patch level on the server, which includes some NFS server stability fixes, and also look at sliding to the head of 5-STABLE? There are a number of performance and stability improvements that may be relevant there. Could you provide serial console output of the full panic message, trap details, compile the kernel with KDB+DDB, and include a full stack trace? I'm happy to try to help debug these problems. > 4. With soft updates enabled during FSCK the fileserver will freeze with > all NFS processs in the "snaplck" state. We disabled soft updates > because of this. If it's possible to do get some more information, it would be quite helpful. In particular, could you compile the server box with DDB+KDB+BREAK_TO_DEBUGGER, breka into the serial debugger when it appears wedged, and put the contents of "show lockedvnods", "ps", and "trace " of any processes listed in "show lockedvnods" output, that would be great. A crash dump would also be very helpful. For some hints on the information that is necessary here, take a look at the handbook chapter on kernel debugging and reporting kernel bugs, and my recent post to current@ diagnosing a similar bug. If you e-enable soft updates but leave bgfsck disabled, does that correct this stability problem? In any case, I'm happy to help try to figure out what's going on -- some of the above information for stability and performance problems would be quite helpful in tracking it down. Robert N M Watson From owner-freebsd-performance@FreeBSD.ORG Sat Feb 19 12:26:17 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 24C1716A4CE for ; Sat, 19 Feb 2005 12:26:17 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id D720543D53 for ; Sat, 19 Feb 2005 12:26:16 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id 7637246B08; Sat, 19 Feb 2005 07:26:16 -0500 (EST) Date: Sat, 19 Feb 2005 12:24:46 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Jeremie Le Hen In-Reply-To: <20050218105908.GV82324@obiwan.tataz.chchile.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: David Rice cc: freebsd-performance@freebsd.org cc: Claus Guttesen Subject: Re: High traffic NFS performance and availability problems X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2005 12:26:17 -0000 On Fri, 18 Feb 2005, Jeremie Le Hen wrote: > > You could try to cvsup to the latest RELENG_5 on > > client and server. Are you using udp? Try switching to > > tcp if not, may not apply to 4.x. > > AFAIK, RELENG_4 has a very robust NFS implementation. I believe that > switching the client from 4.x to 5.x is a waste of time in the idea of > improving NFS performance, but maybe I'm wrong. I think leaving the 4.x clients in a known configuration and just varying the server configurations the right starting point. Let's try tracking the server 5.x stability/performance first, then look into the client 4.x crash reports. Robert N M Watson From owner-freebsd-performance@FreeBSD.ORG Sat Feb 19 13:17:37 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F38BC16A4CE for ; Sat, 19 Feb 2005 13:17:36 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id B046C43D46 for ; Sat, 19 Feb 2005 13:17:36 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id 55FD146B8A for ; Sat, 19 Feb 2005 08:17:36 -0500 (EST) Date: Sat, 19 Feb 2005 13:16:06 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: performance@FreeBSD.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: libpthread vs libthread, simply mysql benchmark X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2005 13:17:37 -0000 In case it's of interest -- I occasionally run MySQL "supersmack" benchmarks at work on a dual Xeon box there. I ran the select benchmark on the box a few minutes ago, comparing libpthread and David Xu's new libthread on the box, and the results are pleasing: x 6-SMP-HTT-libpthread + 6-SMP-HTT-libthread +--------------------------------------------------------------------------+ | x + | | x ++ | | xxx +++ | |xxxxx +++ +| | |MA| |A| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 8300.25 8442.54 8379.03 8381.652 39.682672 + 10 10510.35 10646.98 10547.59 10551.599 39.893907 Difference at 95.0% confidence 2169.95 +/- 37.385 25.8893% +/- 0.446034% (Student's t, pooled s = 39.7884) In other words, a clear (and healthy) 26% performance improvement on this simple benchmark. I don't currently have other numbers to compare against, such as linuxthreads, etc. This is a 2.4GHz box with 1gb of memory running MySQL 4.0.23a. Typically, I get better performance without hyper-threading turned on, but I can't get into the BIOS of the box remotely so couldn't turn it off properly. I used the latest 6.x SMP kernel combined with the libthread drop from David's perforce branch. You can find a URL to his more recent code drops in the recent threads on freebsd-threads. FYI, here's a brief history of HTT performance over the past year as 5.x and 6.x matured: Version Transactions/sec 20040515-UP-4BSD 4862 20040515-SMP-4BSD 4620 20040515-SMP-ADMTX-4BSD 4846 20040615-UP-4BSD 4899 20040616-SMP-4BSD 4941 20040616-SMP-ADMTX-4BSD 4979 20040616-netperf-UP-giant-4BSD 4907 20040616-netperf-UP-mpsafe-4BSD 4939 20040616-netperf-SMP-giant-4BSD 4587 20040616-netperf-SMP-mpsafe-4BSD 4609 20040616-netperf-SMP-ADMTX-giant-4BSD 4662 20040616-netperf-SMP-ADMTX-mpsafe-4BSD 6425 20040713-netperf-SMP-ADMTX-mpsafe-4BSD 7063 20040717-netperf-SMP-ADMTX-mpsafe-4BSD 7118 As of today, I get about 8400tps with HTT turned on, probably a bit betterwith it turned off. By combining various factors we've introduced in the last couple of years, such as MPSAFE network stack, scheduling improvements, threading improvements, mutex changes, etc, we've improved performance on mysql by over 100% on SMP, going from quite sub-par performance in the depths of 5.x development (when all the infrastructure changes were going in but no optimizations) to quite healthy in 6.x, especially with the new threading library. Robert N M Watson From owner-freebsd-performance@FreeBSD.ORG Sat Feb 19 16:29:46 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A1B3616A4CE; Sat, 19 Feb 2005 16:29:46 +0000 (GMT) Received: from stephanie.unixdaemons.com (stephanie.unixdaemons.com [67.18.111.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 273C243D41; Sat, 19 Feb 2005 16:29:46 +0000 (GMT) (envelope-from bmilekic@technokratis.com) Received: from stephanie.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1])j1JGTiI8096782; Sat, 19 Feb 2005 11:29:44 -0500 (EST) Received: (from bmilekic@localhost) by stephanie.unixdaemons.com (8.13.3/8.12.1/Submit) id j1JGTicV096781; Sat, 19 Feb 2005 11:29:44 -0500 (EST) (envelope-from bmilekic@technokratis.com) X-Authentication-Warning: stephanie.unixdaemons.com: bmilekic set sender to bmilekic@technokratis.com using -f Date: Sat, 19 Feb 2005 11:29:44 -0500 From: Bosko Milekic To: Robert Watson Message-ID: <20050219162944.GA96337@technokratis.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i cc: performance@freebsd.org Subject: Re: libpthread vs libthread, simply mysql benchmark X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2005 16:29:46 -0000 On Sat, Feb 19, 2005 at 01:16:06PM +0000, Robert Watson wrote: > FYI, here's a brief history of HTT performance over the past year as 5.x > and 6.x matured: > > Version Transactions/sec > 20040515-UP-4BSD 4862 > 20040515-SMP-4BSD 4620 > 20040515-SMP-ADMTX-4BSD 4846 > > 20040615-UP-4BSD 4899 > 20040616-SMP-4BSD 4941 > 20040616-SMP-ADMTX-4BSD 4979 > > 20040616-netperf-UP-giant-4BSD 4907 > 20040616-netperf-UP-mpsafe-4BSD 4939 > 20040616-netperf-SMP-giant-4BSD 4587 > 20040616-netperf-SMP-mpsafe-4BSD 4609 > 20040616-netperf-SMP-ADMTX-giant-4BSD 4662 > 20040616-netperf-SMP-ADMTX-mpsafe-4BSD 6425 > > 20040713-netperf-SMP-ADMTX-mpsafe-4BSD 7063 > > 20040717-netperf-SMP-ADMTX-mpsafe-4BSD 7118 > > As of today, I get about 8400tps with HTT turned on, probably a bit > betterwith it turned off. By combining various factors we've introduced > in the last couple of years, such as MPSAFE network stack, scheduling > improvements, threading improvements, mutex changes, etc, we've improved > performance on mysql by over 100% on SMP, going from quite sub-par > performance in the depths of 5.x development (when all the infrastructure > changes were going in but no optimizations) to quite healthy in 6.x, > especially with the new threading library. For reference, do you have numbers from RELENG_4? -- Bosko Milekic bmilekic@technokratis.com bmilekic@FreeBSD.org From owner-freebsd-performance@FreeBSD.ORG Sat Feb 19 16:53:14 2005 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A649E16A4CE for ; Sat, 19 Feb 2005 16:53:14 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 463DB43D48 for ; Sat, 19 Feb 2005 16:53:14 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id A186546B23; Sat, 19 Feb 2005 11:53:13 -0500 (EST) Date: Sat, 19 Feb 2005 16:51:43 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Bosko Milekic In-Reply-To: <20050219162944.GA96337@technokratis.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: performance@freebsd.org Subject: Re: libpthread vs libthread, simply mysql benchmark X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Feb 2005 16:53:14 -0000 On Sat, 19 Feb 2005, Bosko Milekic wrote: > > As of today, I get about 8400tps with HTT turned on, probably a bit > > betterwith it turned off. By combining various factors we've introduced > > in the last couple of years, such as MPSAFE network stack, scheduling > > improvements, threading improvements, mutex changes, etc, we've improved > > performance on mysql by over 100% on SMP, going from quite sub-par > > performance in the depths of 5.x development (when all the infrastructure > > changes were going in but no optimizations) to quite healthy in 6.x, > > especially with the new threading library. > > For reference, do you have numbers from RELENG_4? Unfortunately, not for this box. I hope to get a chance to run the tests against a handy dual PIII with 4.x, 5.x, and 6.x in the next couple of days. Just as an FYI to those wanting to give this a spin -- to test with libthread on 6.x, you'll need a copy of MySQL linked against libc.so.6 so it can get to the _umtx_op symbol, which is not currently present in a 5.x libc. This basically translates to meaning: you'll need a package/port of MySQL built against 6.x to test with libthread. The package on ftp.FreeBSD.org worked fine for me. Robert N M Watson