From owner-freebsd-performance@FreeBSD.ORG Sun Aug 31 20:09:06 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D732516A4BF for ; Sun, 31 Aug 2003 20:09:06 -0700 (PDT) Received: from giskard.ag0ny.com (flets-tokyo-1-141.dsn.jp [61.213.134.141]) by mx1.FreeBSD.org (Postfix) with SMTP id 8E6AB43F3F for ; Sun, 31 Aug 2003 20:09:05 -0700 (PDT) (envelope-from ag0ny@ag0ny.com) Received: (qmail 72799 invoked from network); 1 Sep 2003 03:09:02 -0000 Received: from flets-tokyo-1-141.dsn.jp (HELO www.ag0ny.com) (61.213.134.141) by 0 with SMTP; 1 Sep 2003 03:09:02 -0000 Received: from nat.isr.co.jp ([210.251.64.163]) (SquirrelMail authenticated user ag0ny1) by www.ag0ny.com with HTTP; Mon, 1 Sep 2003 12:09:02 +0900 (JST) Message-ID: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com> Date: Mon, 1 Sep 2003 12:09:02 +0900 (JST) From: "Javi Lavandeira" To: freebsd-performance@freebsd.org User-Agent: SquirrelMail/1.4.0 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-2022-jp X-Priority: 3 Importance: Normal Subject: PPP performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: ag0ny@ag0ny.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2003 03:09:07 -0000 Hi, I'm running a 4.6.2-RELEASE system on a PPPoE link, using user-ppp. Since saturday, we're running a new service on this host, and we're getting around 160 DNS requests/second, using tinydns, serving several zones. One of them containing several hundred thousand records. Bandwidth usage is low: around 25KBytes in and 80-100KBytes out average. The problem is that now user-ppp seems to be eating up 25% of the CPU: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 76 root 30 0 3976K 2364K RUN 727:32 22.17% 22.17% ppp This system has been running for more than a year, and there have been transfer peaks of more than 20MBits/s, without impacting the ppp daemon in this way. Any ideas about why ppp is causing this load, and how to solve the problem? I've been googling about this, and read something about kernel-ppp being faster, but couldn't find anything about kernel-ppp in the handbook. Thanks in advance, From owner-freebsd-performance@FreeBSD.ORG Mon Sep 1 11:34:39 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 331A416A584 for ; Mon, 1 Sep 2003 11:34:39 -0700 (PDT) Received: from abigail.blackend.org (blackend.org [212.11.35.229]) by mx1.FreeBSD.org (Postfix) with ESMTP id A4199442AA for ; Mon, 1 Sep 2003 11:31:41 -0700 (PDT) (envelope-from marc@blackend.org) Received: from nosferatu.blackend.org (nosferatu.blackend.org [192.168.10.205]) by abigail.blackend.org (8.12.9/8.12.3) with ESMTP id h81IVCak048793; Mon, 1 Sep 2003 20:31:12 +0200 (CEST) (envelope-from marc@abigail.blackend.org) Received: from nosferatu.blackend.org (localhost [127.0.0.1]) h81IUG74000748; Mon, 1 Sep 2003 20:30:16 +0200 (CEST) (envelope-from marc@nosferatu.blackend.org) Received: (from marc@localhost) by nosferatu.blackend.org (8.12.9/8.12.9/Submit) id h81IUF9R000747; Mon, 1 Sep 2003 20:30:15 +0200 (CEST) (envelope-from marc) Date: Mon, 1 Sep 2003 20:30:15 +0200 From: Marc Fonvieille To: Javi Lavandeira Message-ID: <20030901183015.GC578@nosferatu.blackend.org> References: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com> User-Agent: Mutt/1.4i X-Useless-Header: blackend.org X-Operating-System: FreeBSD 5.1-CURRENT cc: freebsd-performance@freebsd.org Subject: Re: PPP performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2003 18:34:39 -0000 On Mon, Sep 01, 2003 at 12:09:02PM +0900, Javi Lavandeira wrote: [...] > > Any ideas about why ppp is causing this load, and how to solve the > problem? I've been googling about this, and read something about > kernel-ppp being faster, but couldn't find anything about kernel-ppp in > the handbook. > 18.3 Using Kernel PPP http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ppp.html Marc From owner-freebsd-performance@FreeBSD.ORG Mon Sep 1 17:39:12 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 74FBB16A4BF for ; Mon, 1 Sep 2003 17:39:12 -0700 (PDT) Received: from giskard.ag0ny.com (flets-tokyo-1-141.dsn.jp [61.213.134.141]) by mx1.FreeBSD.org (Postfix) with SMTP id A775F43FFD for ; Mon, 1 Sep 2003 17:39:08 -0700 (PDT) (envelope-from ag0ny@ag0ny.com) Received: (qmail 23644 invoked from network); 2 Sep 2003 00:39:04 -0000 Received: from flets-tokyo-1-141.dsn.jp (HELO www.ag0ny.com) (61.213.134.141) by 0 with SMTP; 2 Sep 2003 00:39:04 -0000 Received: from cosmos3.ag0ny.com ([192.168.0.2]) (SquirrelMail authenticated user ag0ny1) by www.ag0ny.com with HTTP; Tue, 2 Sep 2003 09:39:04 +0900 (JST) Message-ID: <3048.192.168.0.2.1062463144.squirrel@www.ag0ny.com> In-Reply-To: <20030901183015.GC578@nosferatu.blackend.org> References: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com> <20030901183015.GC578@nosferatu.blackend.org> Date: Tue, 2 Sep 2003 09:39:04 +0900 (JST) From: "Javi Lavandeira" To: "Marc Fonvieille" User-Agent: SquirrelMail/1.4.0 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-2022-jp X-Priority: 3 Importance: Normal cc: freebsd-performance@freebsd.org Subject: Re: PPP performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: ag0ny@ag0ny.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 00:39:12 -0000 Hi, > 18.3 Using Kernel PPP > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ppp.html Yes, but it doesn't mention PPPoE, which is what I need. Does the kernel PPP support PPPoE? Thanks in advance, From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 00:04:34 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 96E1616A4BF for ; Tue, 2 Sep 2003 00:04:34 -0700 (PDT) Received: from abigail.blackend.org (blackend.org [212.11.35.229]) by mx1.FreeBSD.org (Postfix) with ESMTP id CE21644017 for ; Tue, 2 Sep 2003 00:04:32 -0700 (PDT) (envelope-from marc@blackend.org) Received: from nosferatu.blackend.org (nosferatu.blackend.org [192.168.10.205]) by abigail.blackend.org (8.12.9/8.12.3) with ESMTP id h8274Mak069131; Tue, 2 Sep 2003 09:04:22 +0200 (CEST) (envelope-from marc@abigail.blackend.org) Received: from nosferatu.blackend.org (localhost [127.0.0.1]) h8273RKN000691; Tue, 2 Sep 2003 09:03:27 +0200 (CEST) (envelope-from marc@nosferatu.blackend.org) Received: (from marc@localhost) by nosferatu.blackend.org (8.12.9/8.12.9/Submit) id h8273NhR000690; Tue, 2 Sep 2003 09:03:23 +0200 (CEST) (envelope-from marc) Date: Tue, 2 Sep 2003 09:03:22 +0200 From: Marc Fonvieille To: Javi Lavandeira Message-ID: <20030902070322.GB568@nosferatu.blackend.org> References: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com> <20030901183015.GC578@nosferatu.blackend.org> <3048.192.168.0.2.1062463144.squirrel@www.ag0ny.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3048.192.168.0.2.1062463144.squirrel@www.ag0ny.com> User-Agent: Mutt/1.4i X-Useless-Header: blackend.org X-Operating-System: FreeBSD 5.1-CURRENT cc: freebsd-performance@freebsd.org Subject: Re: PPP performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 07:04:34 -0000 On Tue, Sep 02, 2003 at 09:39:04AM +0900, Javi Lavandeira wrote: > Hi, > > > 18.3 Using Kernel PPP > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ppp.html > > Yes, but it doesn't mention PPPoE, which is what I need. Does the kernel > PPP support PPPoE? > You need a PPPoE client. Look on google for the "Linux way" to use PPPoE with pppd. (Now this talk is out of freebsd-performance aim :) ) Marc From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 12:42:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1896716A4BF; Tue, 2 Sep 2003 12:42:45 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 472B343FFB; Tue, 2 Sep 2003 12:42:44 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKLQJB00.83I; Tue, 2 Sep 2003 12:37:59 -0700 From: "Max Clark" To: , , Date: Tue, 2 Sep 2003 12:48:29 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Subject: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 19:42:45 -0000 Sorry for the cross post. -----Original Message----- From: owner-freebsd-isp@freebsd.org [mailto:owner-freebsd-isp@freebsd.org]On Behalf Of Max Clark Sent: Tuesday, September 02, 2003 11:00 AM To: freebsd-isp@freebsd.org Subject: 20TB Storage System Hi all, I need to attach 20TB of storage to a network (as low cost as possible), I need to sustain 250Mbit/s or 30MByte/s of sustained IO from the storage to the disk. I have found external Fibre Channel -> ATA 133 Raid enclosures. These enclosures will house 16 drives so with 250GB drives a total of 3.5TB each after a RAID 5 format. These enclosures have advertised sustained IO of 90-100MByte/s each. One solution we are thinking about is to use a Intel XEON server with 3x FC HBA controller cards in the server each attached to a separate storage enclosure. In any event we would be required to use ccd or vinum to stripe multiple storage enclosures together to form one logical volume. I can partition this system into two separate 10TB storage pools. Given the above: 1) What would my expected IO be using vinum to stripe the storage enclosures detailed above? 2) What is the maximum size of a filesystem that I can present to the host OS using vinum/ccd? Am I limited anywhere that I am not aware of? 3) Could I put all 20TB on one system, or will I need two to sustain the IO required? 4) If you were building this system how would you do it? (The installed $/GB must be below $5.00 dollars). My other options are to use Solaris or Windows (which I would rather not do). Thanks in advance, Max _______________________________________________ freebsd-isp@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-isp To unsubscribe, send any mail to "freebsd-isp-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 12:50:10 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F031B16A4BF for ; Tue, 2 Sep 2003 12:50:10 -0700 (PDT) Received: from otter3.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 020A743FE0 for ; Tue, 2 Sep 2003 12:50:10 -0700 (PDT) (envelope-from anderson@centtech.com) Received: from centtech.com (neutrino.centtech.com [204.177.173.28]) by otter3.centtech.com (8.12.3/8.12.3) with ESMTP id h82Jo9ob095688; Tue, 2 Sep 2003 14:50:09 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <3F54F46E.2070203@centtech.com> Date: Tue, 02 Sep 2003 14:50:06 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Clark References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org Subject: Re: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 19:50:11 -0000 Max Clark wrote: > Hi all, > > I need to attach 20TB of storage to a network (as low cost as possible), I > need to sustain 250Mbit/s or 30MByte/s of sustained IO from the storage to > the disk. > > I have found external Fibre Channel -> ATA 133 Raid enclosures. These > enclosures will house 16 drives so with 250GB drives a total of 3.5TB each > after a RAID 5 format. These enclosures have advertised sustained IO of > 90-100MByte/s each. > > One solution we are thinking about is to use a Intel XEON server with 3x FC > HBA controller cards in the server each attached to a separate storage > enclosure. In any event we would be required to use ccd or vinum to stripe > multiple storage enclosures together to form one logical volume. > > I can partition this system into two separate 10TB storage pools. > > Given the above: > 1) What would my expected IO be using vinum to stripe the storage enclosures > detailed above? > 2) What is the maximum size of a filesystem that I can present to the host > OS using vinum/ccd? Am I limited anywhere that I am not aware of? > 3) Could I put all 20TB on one system, or will I need two to sustain the IO > required? > 4) If you were building this system how would you do it? (The installed $/GB > must be below $5.00 dollars). > > My other options are to use Solaris or Windows (which I would rather not > do). I can tell you right now I have Solaris and Windows machines attempting file server traffic, and only Solaris even gets in the right realm of speed, but FreeBSD blows them both flat over. Your bottleneck will most likely be the bus speed of the host, so make sure to use PCI-X adapters if possible. Also, how are you sharing this data? NFS? Samba? FTP? Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 13:01:49 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A047B16A4BF; Tue, 2 Sep 2003 13:01:49 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 60EF044001; Tue, 2 Sep 2003 13:01:48 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h82K1ii8050600; Tue, 2 Sep 2003 22:01:45 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: "Max Clark" From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 02 Sep 2003 12:48:29 PDT." Date: Tue, 02 Sep 2003 22:01:44 +0200 Message-ID: <50599.1062532904@critter.freebsd.dk> cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 20:01:49 -0000 In message , "Max Clark" writ es: >Given the above: >1) What would my expected IO be using vinum to stripe the storage enclosures >detailed above? That depends a lot on the applications I/O pattern, an I doubt a precise prediction is possible. In particular the FibreChannel is hard to predict the throughput off because the various implementations seems to have each their own peculiar quirks performance wise. On a SEAGATE ST318452 disks, I see sequential transfer rates at the outside rim of the disk of 58MB/sec. If I stripe two of them them with CCD I get 107MB/sec. CCD has a better performance than Vinum where they compare. RAID-5 and striping a large number of disks does not scale linearly performance wise, in particular you _may_ see your average access time drop somewhat, but there is by far no guarantee that it will be better than the individual drive. >2) What is the maximum size of a filesystem that I can present to the host >OS using vinum/ccd? Am I limited anywhere that I am not aware of? Good question, I'm not sure we currently know the exact barrier. >3) Could I put all 20TB on one system, or will I need two to sustain the IO >required? Spreading it will give you more I/O bandwidth. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 13:06:15 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 35BD016A4BF for ; Tue, 2 Sep 2003 13:06:15 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AA0C44014 for ; Tue, 2 Sep 2003 13:06:14 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKLRMG00.V3N; Tue, 2 Sep 2003 13:01:28 -0700 From: "Max Clark" To: "Eric Anderson" Date: Tue, 2 Sep 2003 13:11:59 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <3F54F46E.2070203@centtech.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@freebsd.org Subject: RE: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 20:06:15 -0000 This will be mostly Samba with a little bit of FTP. What about the Raid 0 Stripe to combine the disk shelves, ccd or vinum? What will I get better performance with, what should I expect as I add each shelf? Is there anyone out there with 5+TB of storage configured like this that could share some performance numbers? Thanks, Max -----Original Message----- From: Eric Anderson [mailto:anderson@centtech.com] Sent: Tuesday, September 02, 2003 12:50 PM To: Max Clark Cc: freebsd-performance@freebsd.org Subject: Re: FW: 20TB Storage System Max Clark wrote: > Hi all, > > I need to attach 20TB of storage to a network (as low cost as possible), I > need to sustain 250Mbit/s or 30MByte/s of sustained IO from the storage to > the disk. > > I have found external Fibre Channel -> ATA 133 Raid enclosures. These > enclosures will house 16 drives so with 250GB drives a total of 3.5TB each > after a RAID 5 format. These enclosures have advertised sustained IO of > 90-100MByte/s each. > > One solution we are thinking about is to use a Intel XEON server with 3x FC > HBA controller cards in the server each attached to a separate storage > enclosure. In any event we would be required to use ccd or vinum to stripe > multiple storage enclosures together to form one logical volume. > > I can partition this system into two separate 10TB storage pools. > > Given the above: > 1) What would my expected IO be using vinum to stripe the storage enclosures > detailed above? > 2) What is the maximum size of a filesystem that I can present to the host > OS using vinum/ccd? Am I limited anywhere that I am not aware of? > 3) Could I put all 20TB on one system, or will I need two to sustain the IO > required? > 4) If you were building this system how would you do it? (The installed $/GB > must be below $5.00 dollars). > > My other options are to use Solaris or Windows (which I would rather not > do). I can tell you right now I have Solaris and Windows machines attempting file server traffic, and only Solaris even gets in the right realm of speed, but FreeBSD blows them both flat over. Your bottleneck will most likely be the bus speed of the host, so make sure to use PCI-X adapters if possible. Also, how are you sharing this data? NFS? Samba? FTP? Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 13:09:14 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4484116A549; Tue, 2 Sep 2003 13:09:14 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1620B43FE5; Tue, 2 Sep 2003 13:09:13 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKLRRG00.D3I; Tue, 2 Sep 2003 13:04:28 -0700 From: "Max Clark" To: "Poul-Henning Kamp" Date: Tue, 2 Sep 2003 13:14:58 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <50599.1062532904@critter.freebsd.dk> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: RE: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 20:09:14 -0000 I know adding ccd/vinum to the equation will lower my IO throughput, but the question is... if I have an external hardware shelf with 3.5TB (16 250GB drives w/ Raid 5 from hardware) and I put a Raid 0 stripe across 3 of these shelves what would my expected loss of IO be? Thanks, Max -----Original Message----- From: Poul-Henning Kamp [mailto:phk@phk.freebsd.dk] Sent: Tuesday, September 02, 2003 1:02 PM To: Max Clark Cc: freebsd-questions@freebsd.org; freebsd-performance@freebsd.org; freebsd-hackers@freebsd.org Subject: Re: FW: 20TB Storage System In message , "Max Clark" writ es: >Given the above: >1) What would my expected IO be using vinum to stripe the storage enclosures >detailed above? That depends a lot on the applications I/O pattern, an I doubt a precise prediction is possible. In particular the FibreChannel is hard to predict the throughput off because the various implementations seems to have each their own peculiar quirks performance wise. On a SEAGATE ST318452 disks, I see sequential transfer rates at the outside rim of the disk of 58MB/sec. If I stripe two of them them with CCD I get 107MB/sec. CCD has a better performance than Vinum where they compare. RAID-5 and striping a large number of disks does not scale linearly performance wise, in particular you _may_ see your average access time drop somewhat, but there is by far no guarantee that it will be better than the individual drive. >2) What is the maximum size of a filesystem that I can present to the host >OS using vinum/ccd? Am I limited anywhere that I am not aware of? Good question, I'm not sure we currently know the exact barrier. >3) Could I put all 20TB on one system, or will I need two to sustain the IO >required? Spreading it will give you more I/O bandwidth. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 13:12:32 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 37A3B16A4BF; Tue, 2 Sep 2003 13:12:32 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 219524400E; Tue, 2 Sep 2003 13:12:31 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h82KCUi8050922; Tue, 2 Sep 2003 22:12:30 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: "Max Clark" From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 02 Sep 2003 13:14:58 PDT." Date: Tue, 02 Sep 2003 22:12:30 +0200 Message-ID: <50921.1062533550@critter.freebsd.dk> cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 20:12:32 -0000 In message , "Max Clark" writ es: >I know adding ccd/vinum to the equation will lower my IO throughput, but the >question is... if I have an external hardware shelf with 3.5TB (16 250GB >drives w/ Raid 5 from hardware) and I put a Raid 0 stripe across 3 of these >shelves what would my expected loss of IO be? The loss will mostly be from latency, but how much is impossible to tell I think. The statistics of this, even with my trusty old Erlang table would still be too uncertain to be of any value. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 13:34:00 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 948A416A4BF; Tue, 2 Sep 2003 13:34:00 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B44243FD7; Tue, 2 Sep 2003 13:33:59 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h82KXvZH001718; Tue, 2 Sep 2003 23:33:57 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F54FEB3.4050005@he.iki.fi> Date: Tue, 02 Sep 2003 23:33:55 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Poul-Henning Kamp References: <50599.1062532904@critter.freebsd.dk> In-Reply-To: <50599.1062532904@critter.freebsd.dk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org cc: Max Clark Subject: Re: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 20:34:00 -0000 Poul-Henning Kamp wrote: >>2) What is the maximum size of a filesystem that I can present to the host >>OS using vinum/ccd? Am I limited anywhere that I am not aware of? >> >> > >Good question, I'm not sure we currently know the exact barrier. > Just make sure you run UFS2, which is the default on -CURRENT because UFS1 has a 1TB limit. >>3) Could I put all 20TB on one system, or will I need two to sustain the IO >>required? >> >> > >Spreading it will give you more I/O bandwidth. > > > Can you say why? Usually putting more spindles into one pile gives you more I/O, unless you have very evenly distributed sequential access in pattern you can predict in advance. Pete From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 14:43:29 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CE07216A4F4; Tue, 2 Sep 2003 14:43:28 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id BC0B344031; Tue, 2 Sep 2003 14:43:27 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKLW4I00.C3O; Tue, 2 Sep 2003 14:38:42 -0700 From: "Max Clark" To: "Petri Helenius" , "Poul-Henning Kamp" Date: Tue, 2 Sep 2003 14:49:13 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <3F54FEB3.4050005@he.iki.fi> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: RE: FW: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 21:43:29 -0000 Just make sure you run UFS2, which is the default on -CURRENT because UFS1 has a 1TB limit. - What's the limit with UFS2? Are there major requirements to run FreeBSD 5.x or can I still run stable with this? Thanks, Max From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 15:48:09 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E12616A4BF; Tue, 2 Sep 2003 15:48:09 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 548EF43FFB; Tue, 2 Sep 2003 15:48:08 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKLZ4A00.N3E; Tue, 2 Sep 2003 15:43:22 -0700 From: "Max Clark" To: "Dan Nelson" Date: Tue, 2 Sep 2003 15:53:53 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <20030902224136.GA98381@dan.emsphone.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: RE: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 22:48:09 -0000 Depends on whether you plan on crashing or not :) According to http://lists.freebsd.org/pipermail/freebsd-fs/2003-July/000181.html, you may not want to create filesystems over 3TB if you want fsck to succeed. I don't know if that's using the default newfs settings (which would create an insane number of inodes), though. - This is a big problem (no pun intended), my smallest requirement is still 5TB... what would you recommend? The smallest file on the storage will be 500MB. To sustain only 30MByte/s across the entire set? Doesn't really matter, since even a single disk could do that. - What would I see better performance with ccd or vinum? So a better question isn't if I can sustain with 30MByte/s but what would I expect to maintain? From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 15:41:41 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 42D8716A4BF; Tue, 2 Sep 2003 15:41:41 -0700 (PDT) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C7C043F75; Tue, 2 Sep 2003 15:41:38 -0700 (PDT) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.12.9/8.12.9) id h82MfbR2075085; Tue, 2 Sep 2003 17:41:37 -0500 (CDT) (envelope-from dan) Date: Tue, 2 Sep 2003 17:41:37 -0500 From: Dan Nelson To: Max Clark Message-ID: <20030902224136.GA98381@dan.emsphone.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 5.1-CURRENT X-message-flag: Outlook Error User-Agent: Mutt/1.5.4i X-Mailman-Approved-At: Tue, 02 Sep 2003 17:17:33 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 22:41:41 -0000 In the last episode (Sep 02), Max Clark said: > 2) What is the maximum size of a filesystem that I can present to the > host OS using vinum/ccd? Am I limited anywhere that I am not aware > of? Depends on whether you plan on crashing or not :) According to http://lists.freebsd.org/pipermail/freebsd-fs/2003-July/000181.html, you may not want to create filesystems over 3TB if you want fsck to succeed. I don't know if that's using the default newfs settings (which would create an insane number of inodes), though. > 3) Could I put all 20TB on one system, or will I need two to sustain > the IO required? To sustain only 30MByte/s across the entire set? Doesn't really matter, since even a single disk could do that. -- Dan Nelson dnelson@allantgroup.com From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 16:29:06 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C07C16A4BF; Tue, 2 Sep 2003 16:29:06 -0700 (PDT) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1C05343FB1; Tue, 2 Sep 2003 16:29:05 -0700 (PDT) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.12.9/8.12.9) id h82NT2nR044828; Tue, 2 Sep 2003 18:29:02 -0500 (CDT) (envelope-from dan) Date: Tue, 2 Sep 2003 18:29:02 -0500 From: Dan Nelson To: Max Clark Message-ID: <20030902232902.GB98381@dan.emsphone.com> References: <20030902224136.GA98381@dan.emsphone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 5.1-CURRENT X-message-flag: Outlook Error User-Agent: Mutt/1.5.4i X-Mailman-Approved-At: Tue, 02 Sep 2003 17:17:33 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2003 23:29:06 -0000 In the last episode (Sep 02), Max Clark said: [ quoting format manually recovered ] > Dan Nelson wrote > > Depends on whether you plan on crashing or not :) According to > > http://lists.freebsd.org/pipermail/freebsd-fs/2003-July/000181.html, > > you may not want to create filesystems over 3TB if you want fsck to > > succeed. I don't know if that's using the default newfs settings > > (which would create an insane number of inodes), though. > > This is a big problem (no pun intended), my smallest requirement is > still 5TB... what would you recommend? The smallest file on the > storage will be 500MB. I'd say try formatting a 5TB filesystem with the values you'd use (use a very large -i; 1048576 maybe?) and see how much memory fsck consumes. I don't know what UFS2's max blocksize is, but a larger blocksize would help too. You should be able to fake enough storage to do the test with mdconfig and some large sparse files. > > To sustain only 30MByte/s across the entire set? Doesn't really > > matter, since even a single disk could do that. > > What would I see better performance with ccd or vinum? So a better > question isn't if I can sustain with 30MByte/s but what would I > expect to maintain? For sequential access to mirrored arrays, your bottleneck will probably be the ATA->FC bridges, since they claim to only do 100MBytes/sec. If your three HBAs are 1gbit, then those will be your bottleneck and you'll be able to do 300MB/s reads, and 150MB/s writes (50% mirror penalty). If they're 2gbit and you have 6 bridges, you'll max out at 600MB/s and 300MB/s. If you want to use vinum raid5, cut those write speeds in half again (25% raid-5 penalty). Theoretically, assuming you can max your FC links and your server can handle the load :) I do mrtg graphs of my fibre switches, and I haven't seen it peak over 80MB/sec through a 1gbit link, but I regularly see 70MB/sec sustained to some Tru64 Alpha servers. I only have external hardware raid, though, so I don't know what kind of penalty ccd/vinum will add on top of that. Shouldn't be too much. -- Dan Nelson dnelson@allantgroup.com From owner-freebsd-performance@FreeBSD.ORG Tue Sep 2 20:11:16 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C410216A4BF for ; Tue, 2 Sep 2003 20:11:16 -0700 (PDT) Received: from giskard.ag0ny.com (flets-tokyo-1-141.dsn.jp [61.213.134.141]) by mx1.FreeBSD.org (Postfix) with SMTP id 3A8A343FFD for ; Tue, 2 Sep 2003 20:11:15 -0700 (PDT) (envelope-from ag0ny@ag0ny.com) Received: (qmail 32269 invoked from network); 3 Sep 2003 03:11:11 -0000 Received: from flets-tokyo-1-141.dsn.jp (HELO www.ag0ny.com) (61.213.134.141) by 0 with SMTP; 3 Sep 2003 03:11:11 -0000 Received: from nat.isr.co.jp ([210.251.64.163]) (SquirrelMail authenticated user ag0ny1) by www.ag0ny.com with HTTP; Wed, 3 Sep 2003 12:11:11 +0900 (JST) Message-ID: <47240.210.251.64.163.1062558671.squirrel@www.ag0ny.com> In-Reply-To: <20030902070322.GB568@nosferatu.blackend.org> References: <33399.210.251.64.163.1062385742.squirrel@www.ag0ny.com><2003090118301 5.GC578@nosferatu.blackend.org><3048.192.168.0.2.1062463144.squirrel@w ww.ag0ny.com> <20030902070322.GB568@nosferatu.blackend.org> Date: Wed, 3 Sep 2003 12:11:11 +0900 (JST) From: "Javi Lavandeira" To: "Marc Fonvieille" User-Agent: SquirrelMail/1.4.0 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-2022-jp X-Priority: 3 Importance: Normal cc: freebsd-performance@freebsd.org Subject: Re: PPP performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: ag0ny@ag0ny.com List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 03:11:16 -0000 Hi, >> Yes, but it doesn't mention PPPoE, which is what I need. Does the kernel >> PPP support PPPoE? > > You need a PPPoE client. Look on google for the "Linux way" to use PPPoE > with pppd. Yep, I had already googled a bit and I'm currently looking at /usr/ports/net/mpd. > (Now this talk is out of freebsd-performance aim :) ) That's right. If mpd's performance is as bad as /usr/sbin/ppp's, I'll ask in -net. :) Thanks a lot. Regards, -- Javi Lavandeira - http://www.ag0ny.com From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 08:43:36 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7A31616A4C0 for ; Wed, 3 Sep 2003 08:43:36 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 64E5F43F85 for ; Wed, 3 Sep 2003 08:43:35 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKNA4Q00.040; Wed, 3 Sep 2003 08:38:50 -0700 From: "Max Clark" To: "Dan Nelson" Date: Wed, 3 Sep 2003 08:49:20 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <20030903020648.GC98381@dan.emsphone.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@freebsd.org Subject: RE: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 15:43:36 -0000 -----Original Message----- From: Dan Nelson [mailto:dnelson@allantgroup.com] Sent: Tuesday, September 02, 2003 7:07 PM To: Max Clark Subject: Re: 20TB Storage System > Would I be better off using ccd or vinum for a raid 1 stripe here? Over hardware raid? Never; hardware raid is always better than software raid (but is always more expensive too). If you're asking which of ccd and vinum is faster, I don't know. Try both and tell us :) - You misunderstood. I am planning to do a Hardware Raid 5 with each shelf giving 3.5TB of disk. Then I need to put a Raid 0 stripe across the 3 shelves with software to give me the ~10TB filesystem. For this which would be better... ccd or vimum? -Max From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 09:19:13 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2D2316A4BF for ; Wed, 3 Sep 2003 09:19:13 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4CCEF43FDF for ; Wed, 3 Sep 2003 09:19:12 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83GJ62k012517; Wed, 3 Sep 2003 19:19:06 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F561479.2040305@he.iki.fi> Date: Wed, 03 Sep 2003 19:19:05 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Clark References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: Dan Nelson Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 16:19:13 -0000 Max Clark wrote: > >- You misunderstood. I am planning to do a Hardware Raid 5 with each shelf >giving 3.5TB of disk. Then I need to put a Raid 0 stripe across the 3 >shelves with software to give me the ~10TB filesystem. For this which would >be better... ccd or vimum? > > > The only ways to make sure are; 1) go through the source and look for places where block numbers are carried in 32 bit variables 2) test it filling the system with more than one 1TB used on each subdisk (>3TB in your case) I think there would be quite a few people appreciating the results. However, the fsck problem will persist for filesystems larger than 3TB. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 09:59:41 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2FCAD16A4BF for ; Wed, 3 Sep 2003 09:59:41 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8245443F93 for ; Wed, 3 Sep 2003 09:59:40 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKNDNJ00.R47; Wed, 3 Sep 2003 09:54:55 -0700 From: "Max Clark" To: "Petri Helenius" Date: Wed, 3 Sep 2003 10:05:25 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <3F561479.2040305@he.iki.fi> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-performance@freebsd.org cc: Dan Nelson Subject: RE: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 16:59:41 -0000 I think there would be quite a few people appreciating the results. However, the fsck problem will persist for filesystems larger than 3TB. - What exactly is the fsck problem? What do I do about it? Max From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 10:12:52 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3439F16A4BF for ; Wed, 3 Sep 2003 10:12:52 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id E12F943FF9 for ; Wed, 3 Sep 2003 10:12:50 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83HCl2k012705; Wed, 3 Sep 2003 20:12:47 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F56210E.7010206@he.iki.fi> Date: Wed, 03 Sep 2003 20:12:46 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Clark References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: freebsd-performance@freebsd.org cc: Dan Nelson Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 17:12:52 -0000 Max Clark wrote: >- What exactly is the fsck problem? What do I do about it? > > > fsck requires approximately 700k memory for each gigabyte of disk space. I´m unfortunately not familiar enough with the issue how this splits out for blocks and inodes (for example if having only a million inodes on a 10TB fs would make it tolerable) but taking the figure presented earlier, you would need 7GB of memory for checking a 10TB filesystem. Having that kind of memory for a single process neccessiates a 64bit system, like sparc64, alpha, itanic or opteron. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:02:44 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8AA0C16A4BF; Wed, 3 Sep 2003 11:02:44 -0700 (PDT) Received: from Chow.corp.media.net (rottie.media.net [66.113.65.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id C247543FCB; Wed, 3 Sep 2003 11:02:43 -0700 (PDT) (envelope-from max.clark@media.net) Received: from MCLARK (76.0.6.10.IN-ADDR.ARPA [10.6.0.76]) by Chow.corp.media.net (Netscape Messaging Server 4.15) with SMTP id HKNGKM00.J4F; Wed, 3 Sep 2003 10:57:58 -0700 From: "Max Clark" To: "Petri Helenius" Date: Wed, 3 Sep 2003 11:08:28 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: <3F56210E.7010206@he.iki.fi> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org Subject: RE: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:02:44 -0000 Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE could address more than 4GB of Ram. - The PAE support allows FreeBSD machines to make use of more than 4 gigabytes of RAM. This functionality was originally written by Jake Burkholder under contract with DARPA and Network Associates Laboratories. Additional changes for individual device drivers will follow in the coming weeks. If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram for 10TB of disk. Is this correct? Will PAE not function correctly to give me 8GB of Ram? To check 10TB of disk? Is there anyway to bypass this requirement and split fsck into smaller chunks? Being able to fsck my disk is kinda important. I have zero experience with either itanium or opteron. What is the current status of support for these processors in FreeBSD? What would the preferred CPU be? Will there be PCI cards that I would not be able to use in either of these systems? Thanks, -Max -----Original Message----- From: Petri Helenius [mailto:pete@he.iki.fi] Sent: Wednesday, September 03, 2003 10:13 AM To: Max Clark Cc: Dan Nelson; freebsd-performance@freebsd.org Subject: Re: 20TB Storage System Max Clark wrote: >- What exactly is the fsck problem? What do I do about it? > > > fsck requires approximately 700k memory for each gigabyte of disk space. I´m unfortunately not familiar enough with the issue how this splits out for blocks and inodes (for example if having only a million inodes on a 10TB fs would make it tolerable) but taking the figure presented earlier, you would need 7GB of memory for checking a 10TB filesystem. Having that kind of memory for a single process neccessiates a 64bit system, like sparc64, alpha, itanic or opteron. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:16:58 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D11716A4BF; Wed, 3 Sep 2003 11:16:58 -0700 (PDT) Received: from otter3.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 158EB43FDF; Wed, 3 Sep 2003 11:16:57 -0700 (PDT) (envelope-from anderson@centtech.com) Received: from centtech.com (neutrino.centtech.com [204.177.173.28]) by otter3.centtech.com (8.12.3/8.12.3) with ESMTP id h83IGtob047140; Wed, 3 Sep 2003 13:16:56 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <3F563014.5050504@centtech.com> Date: Wed, 03 Sep 2003 13:16:52 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Clark References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:16:58 -0000 Max Clark wrote: > Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE > could address more than 4GB of Ram. > > - The PAE support allows FreeBSD machines to make use of more than 4 > gigabytes of RAM. This functionality was originally written by Jake > Burkholder under contract with DARPA and Network Associates Laboratories. > Additional changes for individual device drivers will follow in the coming > weeks. > > If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram > for 10TB of disk. Is this correct? Will PAE not function correctly to give > me 8GB of Ram? To check 10TB of disk? > > Is there anyway to bypass this requirement and split fsck into smaller > chunks? Being able to fsck my disk is kinda important. Is it possible for you to break up the 10TB partitions into 4TB partitions? If you could ccd those two 10TB RAIDs together into one 20TB ccd'd "drive", then partition that "drive" into 5 4TB chunks, you could get away with it knowing that an fsck would take a LONG time, and use up to 3GB of memory.. in theory. Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:23:03 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9A6C116A4BF; Wed, 3 Sep 2003 11:23:03 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2999E43FD7; Wed, 3 Sep 2003 11:23:02 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83IN02k012985; Wed, 3 Sep 2003 21:23:00 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F563183.3080103@he.iki.fi> Date: Wed, 03 Sep 2003 21:22:59 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Clark References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:23:03 -0000 Max Clark wrote: >Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE >could address more than 4GB of Ram. > > It does. However as long as a pointer is 32 bits, your address space for a process is maxed out at 4G which translates to about 2.5G user after kernel and other things have taken their toll. >If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram >for 10TB of disk. Is this correct? Will PAE not function correctly to give >me 8GB of Ram? To check 10TB of disk? > PAE functions correctly but does not provide for 7G address space. >Is there anyway to bypass this requirement and split fsck into smaller >chunks? Being able to fsck my disk is kinda important. > > Yes, you do that by splitting up the filesystem to smaller filesystems. Kind of obvious? >I have zero experience with either itanium or opteron. What is the current >status of support for these processors in FreeBSD? What would the preferred >CPU be? Will there be PCI cards that I would not be able to use in either of >these systems? > > I´m personally biased towards the Opteron, but that´s more based on that it makes more sense than their technical merits so far (because neither has too much). Both CPU´s should work fine with 5.2 according to the TODO list. Meanwhile I suggest you play with the number of inodes on the 10TB filesystem and see how that affects the memory usage. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:38:06 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E715C16A4BF; Wed, 3 Sep 2003 11:38:06 -0700 (PDT) Received: from pop018.verizon.net (pop018pub.verizon.net [206.46.170.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id E143743FEC; Wed, 3 Sep 2003 11:38:05 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([68.237.14.199]) by pop018.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030903183805.TSH11703.pop018.verizon.net@mac.com>; Wed, 3 Sep 2003 13:38:05 -0500 Message-ID: <3F5634FE.9080303@mac.com> Date: Wed, 03 Sep 2003 14:37:50 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 References: In-Reply-To: X-Enigmail-Version: 0.76.5.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop018.verizon.net from [68.237.14.199] at Wed, 3 Sep 2003 13:38:04 -0500 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:38:07 -0000 Max Clark wrote: > Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE > could address more than 4GB of Ram. It can. PAE lets the hardware address more than 4GB of RAM, but that doesn't change how much memory you can give to any one process: a 32-bit process still has a 32-bit virtual address space. > If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram > for 10TB of disk. Is this correct? Will PAE not function correctly to give > me 8GB of Ram? To check 10TB of disk? Another thread suggests that the maximum amount of memory actually available for a 32-bit process to use under FreeBSD is a little less than 3 GB. > Is there anyway to bypass this requirement and split fsck into smaller > chunks? Being able to fsck my disk is kinda important. Sure. Create multiple filesystems rather than just one, and use symlinks to make the directory namespace fit your needs. I don't know enough about your tasks to give you really specific advice, but I'm wary of the write-performance hit from putting too many drives wide in a RAID-5 (or -5,0) configuration. If you can split up your data by role or typical access pattern, you might well be able to identify some chunks that will be read-mostly (and RAID-5,0 is a good fit) and others that will be read-write or even write-mostly (and thus should be on -1,0). You can also tune other things like blocksize, # of inodes, and so forth more appropriately for each filesystem. -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:45:11 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E562216A4BF for ; Wed, 3 Sep 2003 11:45:11 -0700 (PDT) Received: from alternator.sgh.waw.pl (alternator.sgh.waw.pl [194.145.96.100]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8DB2D43FE5 for ; Wed, 3 Sep 2003 11:45:08 -0700 (PDT) (envelope-from chopin@sgh.waw.pl) Received: from localhost (localhost [127.0.0.1]) by alternator.sgh.waw.pl (Postfix) with SMTP id 80BD52AB100 for ; Wed, 3 Sep 2003 20:45:07 +0200 (CEST) Received: from akson.sgh.waw.pl (akson.sgh.waw.pl [194.145.96.12]) by alternator.sgh.waw.pl (Postfix) with ESMTP id 78B2A2AB0CC for ; Wed, 3 Sep 2003 20:45:07 +0200 (CEST) Received: by akson.sgh.waw.pl (Postfix, from userid 100) id 30FB775A7A; Wed, 3 Sep 2003 20:45:07 +0200 (MET DST) Date: Wed, 3 Sep 2003 20:45:07 +0200 From: Piotr KUCHARSKI To: freebsd-performance@freebsd.org Message-ID: <20030903184506.GC14797@sgh.waw.pl> Mail-Followup-To: Piotr KUCHARSKI , freebsd-performance@freebsd.org References: <3F5634FE.9080303@mac.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3F5634FE.9080303@mac.com> User-Agent: Mutt/1.4i Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:45:12 -0000 On Wed, Sep 03, 2003 at 02:37:50PM -0400, Chuck Swiger wrote: > I'm wary of the write-performance hit from putting too many drives wide > in a RAID-5 (or -5,0) configuration. How many is "too many"? Or, rather, what are write-performance penalties when using sixteen disks in one hw raid5 set? (With two raid volumes, 2TB and 1.75TB available for OS.) p. -- Beware of he who would deny you access to information, for in his heart he dreams himself your master. -- Commissioner Pravin Lal http://nerdquiz.sgh.waw.pl/ -- polska wersja quizu dla nerdów ;) From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:52:55 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 870B416A4BF for ; Wed, 3 Sep 2003 11:52:55 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 40F6743F85 for ; Wed, 3 Sep 2003 11:52:52 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83Iqo2k013124; Wed, 3 Sep 2003 21:52:50 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F563881.70605@he.iki.fi> Date: Wed, 03 Sep 2003 21:52:49 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Piotr KUCHARSKI References: <3F5634FE.9080303@mac.com> <20030903184506.GC14797@sgh.waw.pl> In-Reply-To: <20030903184506.GC14797@sgh.waw.pl> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 8bit cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:52:55 -0000 Piotr KUCHARSKI wrote: >How many is "too many"? Or, rather, what are write-performance penalties >when using sixteen disks in one hw raid5 set? (With two raid volumes, >2TB and 1.75TB available for OS.) > > With raid5 you read one and write two disks for each write, wide arrays really hurt when running on degraded mode but I don´t see what´s the issue when running fully operational? Though people usually expect raid to deliver some performance when degraded so running raid50 instead of too wide raid5 is usually a good idea. (50 being striped raid5 arrays) Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 12:37:52 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4285F16A4BF for ; Wed, 3 Sep 2003 12:37:52 -0700 (PDT) Received: from pop017.verizon.net (pop017pub.verizon.net [206.46.170.210]) by mx1.FreeBSD.org (Postfix) with ESMTP id DE08E43FBD for ; Wed, 3 Sep 2003 12:37:50 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([68.237.14.199]) by pop017.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030903193750.BTCA27671.pop017.verizon.net@mac.com> for ; Wed, 3 Sep 2003 14:37:50 -0500 Message-ID: <3F5642FF.6060702@mac.com> Date: Wed, 03 Sep 2003 15:37:35 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 Cc: freebsd-performance@freebsd.org References: <3F5634FE.9080303@mac.com> <20030903184506.GC14797@sgh.waw.pl> In-Reply-To: <20030903184506.GC14797@sgh.waw.pl> X-Enigmail-Version: 0.76.5.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at pop017.verizon.net from [68.237.14.199] at Wed, 3 Sep 2003 14:37:49 -0500 Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 19:37:52 -0000 Piotr KUCHARSKI wrote: > On Wed, Sep 03, 2003 at 02:37:50PM -0400, Chuck Swiger wrote: >>I'm wary of the write-performance hit from putting too many drives wide >>in a RAID-5 (or -5,0) configuration. > > How many is "too many"? At one point, the advice used to be to use between four and seven disks for a RAID-5 volume. For example, the Apple XServe RAID box has 14 bays, but Apple seems to recommend configuring it as two 7-drive RAID-5 volumes, rather than a single 14-drive-wide RAID-5 volume. > Or, rather, what are write-performance penalties > when using sixteen disks in one hw raid5 set? (With two raid volumes, > 2TB and 1.75TB available for OS.) Find yourself a buncha small files-- a CVS repository, or /usr/ports will do, and compare write performance to a single drive versus RAID-5. Basicly, you get all of the drives in the RAID-set scribbling away at a fraction of the write speed of a single drive, yes? Three disk transactions per write, versus one? Also note that all this disk activity requires three times the I/O bandwidth, interrupts, and assorted overhead. If the OP has hardware RAID which is designed to support a wide array, OK, but setting up a too-wide a RAID-5 array means that things like the system bus may bottleneck performance, rather than the drives. Normally, disk I/O speed is the limiting factor, and your bus and memory are sitting around waiting for the DMA to complete (well, being used by the CPU to run other processes). Let's put it this way, things don't go faster when the drives are waiting for the bus to become available, rather than vice-versa. :-) -- -Chuck From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 12:47:40 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F167D16A4BF for ; Wed, 3 Sep 2003 12:47:40 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29AFE43FDF for ; Wed, 3 Sep 2003 12:47:40 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.9/8.12.3) with ESMTP id h83JlY7c018563; Wed, 3 Sep 2003 12:47:34 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.9/8.12.3/Submit) id h83JlYp6018562; Wed, 3 Sep 2003 12:47:34 -0700 Date: Wed, 3 Sep 2003 12:47:34 -0700 From: Brooks Davis To: Chuck Swiger Message-ID: <20030903194734.GA7936@Odin.AC.HMC.Edu> References: <3F5634FE.9080303@mac.com> <20030903184506.GC14797@sgh.waw.pl> <3F5642FF.6060702@mac.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XsQoSWH+UP9D9v3l" Content-Disposition: inline In-Reply-To: <3F5642FF.6060702@mac.com> User-Agent: Mutt/1.5.4i X-Virus-Scanned: by amavisd-milter (http://amavis.org/) on odin.ac.hmc.edu cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 19:47:41 -0000 --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 03, 2003 at 03:37:35PM -0400, Chuck Swiger wrote: > Piotr KUCHARSKI wrote: > >On Wed, Sep 03, 2003 at 02:37:50PM -0400, Chuck Swiger wrote: > >>I'm wary of the write-performance hit from putting too many drives wide= =20 > >>in a RAID-5 (or -5,0) configuration.=20 > > > >How many is "too many"? >=20 > At one point, the advice used to be to use between four and seven disks f= or=20 > a RAID-5 volume. For example, the Apple XServe RAID box has 14 bays, but= =20 > Apple seems to recommend configuring it as two 7-drive RAID-5 volumes,=20 > rather than a single 14-drive-wide RAID-5 volume. As far as I can tell from taking one of ours apart, that's not recommendation, that's a hard limit due to the system design. The XServe RAID is two 100% independent 7-disk RAID systems. If you look at the controller boards, each one has four High-Point ATA controllers which means it can only access 8 disks with reasonable performance. There appears to be no communication between the halves of the systems. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --XsQoSWH+UP9D9v3l Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/VkVVXY6L6fI4GtQRAjH7AJ0Wy7SliPnuHDctJtcZ4o2XA7B9JACgtVwq rl01HZTF3w6O3FU1qMi2BTY= =K9k3 -----END PGP SIGNATURE----- --XsQoSWH+UP9D9v3l-- From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 04:10:26 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D75616A4BF; Wed, 3 Sep 2003 04:10:26 -0700 (PDT) Received: from chuggalug.clues.com (chuggalug.demon.co.uk [62.49.17.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FF8F43FFB; Wed, 3 Sep 2003 04:10:25 -0700 (PDT) (envelope-from geoffb@chuggalug.clues.com) Received: from chuggalug.clues.com (localhost [127.0.0.1]) by chuggalug.clues.com (8.12.9/8.12.8) with ESMTP id h83B6GVp030367; Wed, 3 Sep 2003 11:06:16 GMT (envelope-from geoffb@chuggalug.clues.com) Received: (from geoffb@localhost) by chuggalug.clues.com (8.12.9/8.12.8/Submit) id h83B6Fag030366; Wed, 3 Sep 2003 11:06:15 GMT Date: Wed, 3 Sep 2003 11:06:15 +0000 From: Geoff Buckingham To: Max Clark Message-ID: <20030903110615.GA25233@chuggalug.clues.com> References: <20030902224136.GA98381@dan.emsphone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-Mailman-Approved-At: Wed, 03 Sep 2003 12:51:27 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 11:10:26 -0000 On Tue, Sep 02, 2003 at 03:53:53PM -0700, Max Clark wrote: > Depends on whether you plan on crashing or not :) According to > http://lists.freebsd.org/pipermail/freebsd-fs/2003-July/000181.html, > you may not want to create filesystems over 3TB if you want fsck to > succeed. I don't know if that's using the default newfs settings > (which would create an insane number of inodes), though. > > - This is a big problem (no pun intended), my smallest requirement is still > 5TB... what would you recommend? The smallest file on the storage will be > 500MB. > If you files are all going this large I imagine you should look carefully at what you do with inodes, block and cluster sizes However I just read the newfs man page and am intrigued to know what effect the -g and -h options have.... -g avgfilesize The expected average file size for the file system. -h avgfpdir The expected average number of files per directory on the file system. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 06:23:11 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7412B16A4BF; Wed, 3 Sep 2003 06:23:11 -0700 (PDT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id B1C2D43FEA; Wed, 3 Sep 2003 06:23:09 -0700 (PDT) (envelope-from dwmalone@maths.tcd.ie) Received: from lanczos.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 3 Sep 2003 14:23:08 +0100 (BST) Date: Wed, 3 Sep 2003 14:23:03 +0100 From: David Malone To: Geoff Buckingham Message-ID: <20030903132303.GA53246@lanczos.maths.tcd.ie> References: <20030902224136.GA98381@dan.emsphone.com> <20030903110615.GA25233@chuggalug.clues.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030903110615.GA25233@chuggalug.clues.com> User-Agent: Mutt/1.5.3i Sender: dwmalone@maths.tcd.ie X-Mailman-Approved-At: Wed, 03 Sep 2003 12:51:27 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Dan Nelson cc: freebsd-questions@freebsd.org cc: Max Clark Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 13:23:11 -0000 On Wed, Sep 03, 2003 at 11:06:15AM +0000, Geoff Buckingham wrote: > However I just read the newfs man page and am intrigued to know what effect > the -g and -h options have.... > > -g avgfilesize > The expected average file size for the file system. > > -h avgfpdir > The expected average number of files per directory on the file > system. I believe these are used by the dirpref stuff to decide how to distribute files and directories evenly throughout the dis. David. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 11:20:51 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 47E3B16A4C1; Wed, 3 Sep 2003 11:20:51 -0700 (PDT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 84E4B43F85; Wed, 3 Sep 2003 11:20:49 -0700 (PDT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.9/8.12.3) with ESMTP id h83IKl7c028675; Wed, 3 Sep 2003 11:20:47 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.9/8.12.3/Submit) id h83IKl6s028674; Wed, 3 Sep 2003 11:20:47 -0700 Date: Wed, 3 Sep 2003 11:20:47 -0700 From: Brooks Davis To: Max Clark Message-ID: <20030903182046.GA6161@Odin.AC.HMC.Edu> References: <3F56210E.7010206@he.iki.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.4i X-Virus-Scanned: by amavisd-milter (http://amavis.org/) on odin.ac.hmc.edu X-Mailman-Approved-At: Wed, 03 Sep 2003 12:51:27 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Petri Helenius Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 18:20:51 -0000 [Please, please, please fix your mailer to quote properly. It's very difficult to read your messages.] On Wed, Sep 03, 2003 at 11:08:28AM -0700, Max Clark wrote: > Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE > could address more than 4GB of Ram. > > - The PAE support allows FreeBSD machines to make use of more than 4 > gigabytes of RAM. This functionality was originally written by Jake > Burkholder under contract with DARPA and Network Associates Laboratories. > Additional changes for individual device drivers will follow in the coming > weeks. > > If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram > for 10TB of disk. Is this correct? Will PAE not function correctly to give > me 8GB of Ram? To check 10TB of disk? PAE increases the amount of RAM available, but does nothing to increase the address space so a given process may not address more then 2GB of RAM. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 12:59:13 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CE18E16A4BF; Wed, 3 Sep 2003 12:59:13 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C79B43FA3; Wed, 3 Sep 2003 12:59:07 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83Jwi2k013456; Wed, 3 Sep 2003 22:58:44 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F5647F3.5080502@he.iki.fi> Date: Wed, 03 Sep 2003 22:58:43 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Geoff Buckingham References: <20030902224136.GA98381@dan.emsphone.com> <20030903110615.GA25233@chuggalug.clues.com> In-Reply-To: <20030903110615.GA25233@chuggalug.clues.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Dan Nelson cc: freebsd-questions@freebsd.org cc: Max Clark Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 19:59:14 -0000 Geoff Buckingham wrote: >>- This is a big problem (no pun intended), my smallest requirement is still >>5TB... what would you recommend? The smallest file on the storage will be >>500MB. >> >> >> >If you files are all going this large I imagine you should look carefully at >what you do with inodes, block and cluster sizes > > fsck problem should be gone with less inodes and less blocks since if I read the code correctly, memory is consumed according to used inodes and blocks so having like 20000 inodes and 64k blocks should allow you to build 5-20T filesystem and actually fsck them. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:07:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B29A16A4C1 for ; Wed, 3 Sep 2003 13:07:27 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2F4FB440A7 for ; Wed, 3 Sep 2003 13:07:06 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h83K71i8064331; Wed, 3 Sep 2003 22:07:02 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: Petri Helenius From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 03 Sep 2003 22:58:43 +0300." <3F5647F3.5080502@he.iki.fi> Date: Wed, 03 Sep 2003 22:07:01 +0200 Message-ID: <64330.1062619621@critter.freebsd.dk> cc: Max Clark cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Geoff Buckingham cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:07:27 -0000 In message <3F5647F3.5080502@he.iki.fi>, Petri Helenius writes: >fsck problem should be gone with less inodes and less blocks since if >I read the code correctly, memory is consumed according to used inodes >and blocks so having like 20000 inodes and 64k blocks should allow >you to build 5-20T filesystem and actually fsck them. I am not sure I would advocate 64k blocks yet. I tend to stick with 32k block, 4k fragment myself. This is a problem which is in the cross-hairs for 6.x -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:24:27 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CB25E16A4C0 for ; Wed, 3 Sep 2003 13:24:27 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D27643FDF for ; Wed, 3 Sep 2003 13:24:25 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83KON2k013574; Wed, 3 Sep 2003 23:24:23 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F564DF6.3090200@he.iki.fi> Date: Wed, 03 Sep 2003 23:24:22 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Poul-Henning Kamp References: <64330.1062619621@critter.freebsd.dk> In-Reply-To: <64330.1062619621@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: Max Clark cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Geoff Buckingham cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:24:27 -0000 Poul-Henning Kamp wrote: >I am not sure I would advocate 64k blocks yet. > > Good to know, I have stuck with 16k so far due to the fact that our database has pagesize of 16k and I found little benefit tuning that. (but it´s completely different application) >I tend to stick with 32k block, 4k fragment myself. > >This is a problem which is in the cross-hairs for 6.x > > You have any insight into the fsck memory consumption? I remember getting myself saved quite a long time ago by reducing the number of inodes. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:28:38 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0939C16A4BF for ; Wed, 3 Sep 2003 13:28:38 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id D84E243FF7 for ; Wed, 3 Sep 2003 13:28:36 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h83KSZi8064485; Wed, 3 Sep 2003 22:28:35 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: Petri Helenius From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 03 Sep 2003 23:24:22 +0300." <3F564DF6.3090200@he.iki.fi> Date: Wed, 03 Sep 2003 22:28:35 +0200 Message-ID: <64484.1062620915@critter.freebsd.dk> cc: Max Clark cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Geoff Buckingham cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:28:38 -0000 In message <3F564DF6.3090200@he.iki.fi>, Petri Helenius writes: >You have any insight into the fsck memory consumption? I remember getting >myself saved quite a long time ago by reducing the number of inodes. I have not studied it. I always try to avoid having more than an order of magnitude more inodes than I need, it also saves fsck time. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:30:45 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C709816A4BF for ; Wed, 3 Sep 2003 13:30:45 -0700 (PDT) Received: from otter3.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id E595243FBD for ; Wed, 3 Sep 2003 13:30:44 -0700 (PDT) (envelope-from anderson@centtech.com) Received: from centtech.com (neutrino.centtech.com [204.177.173.28]) by otter3.centtech.com (8.12.3/8.12.3) with ESMTP id h83KUiob067498; Wed, 3 Sep 2003 15:30:44 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <3F564F70.1000905@centtech.com> Date: Wed, 03 Sep 2003 15:30:40 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Poul-Henning Kamp References: <64484.1062620915@critter.freebsd.dk> In-Reply-To: <64484.1062620915@critter.freebsd.dk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: Petri Helenius Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:30:45 -0000 Poul-Henning Kamp wrote: > In message <3F564DF6.3090200@he.iki.fi>, Petri Helenius writes: > > >>You have any insight into the fsck memory consumption? I remember getting >>myself saved quite a long time ago by reducing the number of inodes. > > > I have not studied it. I always try to avoid having more than an > order of magnitude more inodes than I need, it also saves fsck time. > So what's the appropriate way to calculate what blocksize and how many inodes you should use? Eric -- ------------------------------------------------------------------ Eric Anderson Systems Administrator Centaur Technology All generalizations are false, including this one. ------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:34:21 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 54CE516A4BF for ; Wed, 3 Sep 2003 13:34:21 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56BBD43FAF for ; Wed, 3 Sep 2003 13:34:20 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h83KYGi8064561; Wed, 3 Sep 2003 22:34:17 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: Eric Anderson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 03 Sep 2003 15:30:40 CDT." <3F564F70.1000905@centtech.com> Date: Wed, 03 Sep 2003 22:34:16 +0200 Message-ID: <64560.1062621256@critter.freebsd.dk> cc: freebsd-performance@freebsd.org cc: Petri Helenius Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:34:21 -0000 In message <3F564F70.1000905@centtech.com>, Eric Anderson writes: >Poul-Henning Kamp wrote: >> In message <3F564DF6.3090200@he.iki.fi>, Petri Helenius writes: >> >> >>>You have any insight into the fsck memory consumption? I remember getting >>>myself saved quite a long time ago by reducing the number of inodes. >> >> >> I have not studied it. I always try to avoid having more than an >> order of magnitude more inodes than I need, it also saves fsck time. >> > >So what's the appropriate way to calculate what blocksize and how many >inodes you should use? "Know your data" :-/ "df -i" will report both block and inode usage for a filesystem. You adjust number if inodes by specifying the expected average number of bytes per inode (== bytes_used / inodes_used) to newfs. block/fragment I have not heuristics for, but I think 32/4 is a good alround setting for multi-GB disks. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:41:08 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E50E316A4BF for ; Wed, 3 Sep 2003 13:41:08 -0700 (PDT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC00843FEA for ; Wed, 3 Sep 2003 13:41:07 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h81.vuokselantie10.fi [193.64.42.129]) by silver.he.iki.fi (8.12.9/8.11.4) with ESMTP id h83Kew2k013685; Wed, 3 Sep 2003 23:40:59 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <3F5651D9.1030701@he.iki.fi> Date: Wed, 03 Sep 2003 23:40:57 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Eric Anderson References: <64484.1062620915@critter.freebsd.dk> <3F564F70.1000905@centtech.com> In-Reply-To: <3F564F70.1000905@centtech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: Poul-Henning Kamp cc: freebsd-performance@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:41:09 -0000 Eric Anderson wrote: > > So what's the appropriate way to calculate what blocksize and how many > inodes you should use? > I might be wrong but for specific applications you just know the block size based on what the application uses. For the rest, look for the average IO size on systat´s vmstat display. For inodes, see how many you have and do 2-4 times the number you think you need, they are little complicated to add afterwards. Pete From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 13:43:18 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F09AE16A4BF for ; Wed, 3 Sep 2003 13:43:18 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCDFD43FEC for ; Wed, 3 Sep 2003 13:43:17 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h83KhEi8064639; Wed, 3 Sep 2003 22:43:14 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: Petri Helenius From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 03 Sep 2003 23:40:57 +0300." <3F5651D9.1030701@he.iki.fi> Date: Wed, 03 Sep 2003 22:43:14 +0200 Message-ID: <64638.1062621794@critter.freebsd.dk> cc: freebsd-performance@freebsd.org cc: Eric Anderson Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2003 20:43:19 -0000 In message <3F5651D9.1030701@he.iki.fi>, Petri Helenius writes: >For inodes, see how many you have and do 2-4 times the number you think >you need, they are little complicated to add afterwards. Congratulations! You just won the "Understatement of the Month" award :-) Poul-Henning (who wish he could afford to have multi TB issues himself) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Wed Sep 3 23:39:32 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AF47116A4BF; Wed, 3 Sep 2003 23:39:32 -0700 (PDT) Received: from smtp-out6.blueyonder.co.uk (smtp-out6.blueyonder.co.uk [195.188.213.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id 708AE43FDD; Wed, 3 Sep 2003 23:39:30 -0700 (PDT) (envelope-from dwmalone@maths.tcd.ie) Received: from cluster5 ([172.23.146.54]) by smtp-out6.blueyonder.co.uk with Microsoft SMTPSVC(5.0.2195.5600); Thu, 4 Sep 2003 03:20:53 +0100 Received: from mail pickup service by cluster5 with Microsoft SMTPSVC; Wed, 3 Sep 2003 23:37:09 +0100 Received: from smtp-in4.blueyonder.co.uk ([172.23.146.15]) by cluster5 with Microsoft SMTPSVC(5.0.2195.5329); Wed, 3 Sep 2003 14:25:39 +0100 Received: from exim11.blueyonder.co.uk ([195.188.213.46]) by smtp-in4.blueyonder.co.uk with Microsoft SMTPSVC(5.0.2195.5600); Wed, 3 Sep 2003 14:24:08 +0100 Received: from [216.136.204.119] (helo=mx2.freebsd.org) by exim11.blueyonder.co.uk with esmtp (Exim 4.14) id 19uXcG-0000Qi-Nd for xtalsinger@blueyonder.co.uk; Wed, 03 Sep 2003 14:24:08 +0100 Received: from hub.freebsd.org (hub.freebsd.org [216.136.204.18]) by mx2.freebsd.org (Postfix) with ESMTP id EC39E56B84; Wed, 3 Sep 2003 06:23:26 -0700 (PDT) (envelope-from owner-freebsd-questions@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 0F48316A530; Wed, 3 Sep 2003 06:23:21 -0700 (PDT) Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7412B16A4BF; Wed, 3 Sep 2003 06:23:11 -0700 (PDT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id B1C2D43FEA; Wed, 3 Sep 2003 06:23:09 -0700 (PDT) (envelope-from dwmalone@maths.tcd.ie) Received: from lanczos.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 3 Sep 2003 14:23:08 +0100 (BST) Date: Wed, 3 Sep 2003 14:23:03 +0100 From: David Malone To: Geoff Buckingham Message-ID: <20030903132303.GA53246@lanczos.maths.tcd.ie> References: <20030902224136.GA98381@dan.emsphone.com> <20030903110615.GA25233@chuggalug.clues.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030903110615.GA25233@chuggalug.clues.com> User-Agent: Mutt/1.5.3i X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Sender: owner-freebsd-questions@freebsd.org Errors-To: owner-freebsd-questions@freebsd.org X-Sent-To: xtalsinger@blueyonder.co.uk X-OriginalArrivalTime: 03 Sep 2003 13:24:08.0853 (UTC) FILETIME=[A7EEB450:01C3721E] cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Dan Nelson cc: freebsd-questions@freebsd.org cc: Max Clark Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 06:39:32 -0000 On Wed, Sep 03, 2003 at 11:06:15AM +0000, Geoff Buckingham wrote: > However I just read the newfs man page and am intrigued to know what effect > the -g and -h options have.... > > -g avgfilesize > The expected average file size for the file system. > > -h avgfpdir > The expected average number of files per directory on the file > system. I believe these are used by the dirpref stuff to decide how to distribute files and directories evenly throughout the dis. David. _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Thu Sep 4 01:14:33 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B7B216A4BF; Thu, 4 Sep 2003 01:14:33 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 40D7743FF5; Thu, 4 Sep 2003 01:14:32 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0e8.dialup.mindspring.com ([209.86.1.200] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19upG3-0006YA-00; Thu, 04 Sep 2003 01:14:24 -0700 Message-ID: <3F56F3FD.C636781@mindspring.com> Date: Thu, 04 Sep 2003 01:12:45 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Max Clark References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a45da0b3071c5aaf85155b6e8415be78b693caf27dac41a8fd350badd9bab72f9c350badd9bab72f9c cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Petri Helenius Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 08:14:33 -0000 Max Clark wrote: > Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE > could address more than 4GB of Ram. The kernel being able to address the RAM does not meant that the KVA+UVA space is larger than 4G. At best, you could take the uiomove/copyin/copyout performance hit, and move both of thse to 4G, each, rather than 4G total. That still limits you to 4G. > If fsck requires 700K for each 1GB of Disk, we are talking about 7GB of Ram > for 10TB of disk. Is this correct? Will PAE not function correctly to give > me 8GB of Ram? To check 10TB of disk? No, it will not. > Is there anyway to bypass this requirement and split fsck into smaller > chunks? Being able to fsck my disk is kinda important. Yes. Limit the number of CG bitmaps you examine simultaneously, and make the operation multiple pass over the disk. This is not that hard a modification to fsck, and it can be done fairly quickly by anyone who understands the code. The code in time to fsck the disk will go up inversely proportionally to the amount of RAM it's allowed to use, which is limited to the UVA size minus the fsck program size itself, and the fsck buffers used for things like FS metadata for a given file/directory. > I have zero experience with either itanium or opteron. What is the current > status of support for these processors in FreeBSD? What would the preferred > CPU be? Will there be PCI cards that I would not be able to use in either of > these systems? I have no idea whether these systems support a larger UVA size, or how much memory you could jam into them... -- Terry From owner-freebsd-performance@FreeBSD.ORG Thu Sep 4 04:06:03 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B869C16A4BF; Thu, 4 Sep 2003 04:06:03 -0700 (PDT) Received: from chuggalug.clues.com (chuggalug.demon.co.uk [62.49.17.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id 90F9C43FDD; Thu, 4 Sep 2003 04:06:02 -0700 (PDT) (envelope-from geoffb@chuggalug.clues.com) Received: from chuggalug.clues.com (localhost [127.0.0.1]) by chuggalug.clues.com (8.12.9/8.12.8) with ESMTP id h84B1vVp035853; Thu, 4 Sep 2003 11:01:57 GMT (envelope-from geoffb@chuggalug.clues.com) Received: (from geoffb@localhost) by chuggalug.clues.com (8.12.9/8.12.8/Submit) id h84B1uDu035852; Thu, 4 Sep 2003 11:01:56 GMT Date: Thu, 4 Sep 2003 11:01:55 +0000 From: Geoff Buckingham To: Terry Lambert Message-ID: <20030904110155.GA35273@chuggalug.clues.com> References: <3F56F3FD.C636781@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3F56F3FD.C636781@mindspring.com> User-Agent: Mutt/1.4.1i cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Petri Helenius cc: Max Clark Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 11:06:03 -0000 On Thu, Sep 04, 2003 at 01:12:45AM -0700, Terry Lambert wrote: > > Yes. Limit the number of CG bitmaps you examine simultaneously, > and make the operation multiple pass over the disk. This is not > that hard a modification to fsck, and it can be done fairly > quickly by anyone who understands the code. The code in time to > fsck the disk will go up inversely proportionally to the amount > of RAM it's allowed to use, which is limited to the UVA size > minus the fsck program size itself, and the fsck buffers used for > things like FS metadata for a given file/directory. > > Pardon my ignorance but does the number of inodes in the filesystem have a significant impact on the memory requirement of fsck? I ask as it was previously stated the smallest file on the 10TB filessytem would be 500MB which would enable a vastley reduced number of inodes and possibly very large block fragment and cluster sizes? From owner-freebsd-performance@FreeBSD.ORG Thu Sep 4 11:53:23 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F101116A4BF; Thu, 4 Sep 2003 11:53:23 -0700 (PDT) Received: from rwcrmhc13.comcast.net (rwcrmhc13.comcast.net [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id 30D9B43FFB; Thu, 4 Sep 2003 11:53:22 -0700 (PDT) (envelope-from julian@elischer.org) Received: from interjet.elischer.org ([12.233.125.100]) by attbi.com (rwcrmhc13) with ESMTP id <2003090418532101500r4r2ae>; Thu, 4 Sep 2003 18:53:21 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA41656; Thu, 4 Sep 2003 11:53:20 -0700 (PDT) Date: Thu, 4 Sep 2003 11:53:18 -0700 (PDT) From: Julian Elischer To: Tim Kientzle In-Reply-To: <3F56352F.7050701@acm.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Mailman-Approved-At: Thu, 04 Sep 2003 12:29:37 -0700 cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Petri Helenius cc: Max Clark Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 18:53:24 -0000 On Wed, 3 Sep 2003, Tim Kientzle wrote: > Max Clark wrote: > > Ohh, that's an interesting snag. I was under the impression that 5.x w/ PAE > > could address more than 4GB of Ram. > > That's >4G of memory in the system. 32-bit processors > are still limited to 4G processor address space, which means > <3G per process (allowing some memory for kernel operations). > You can't get around that unless you either go for a 64-bit > processor or do some complex coding to break your application > storage across multiple processes. It's worse than that, becasue I think that to handle >4GB of ram you need to limit your processes to about 2G of virtual space. From owner-freebsd-performance@FreeBSD.ORG Thu Sep 4 15:07:24 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 86FCD16A4BF for ; Thu, 4 Sep 2003 15:07:24 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 355A943FAF for ; Thu, 4 Sep 2003 15:07:21 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 33822 invoked by uid 1001); 4 Sep 2003 22:07:10 -0000 Date: Thu, 4 Sep 2003 17:07:09 -0500 From: "Jim C. Nasby" To: freebsd-performance@FreeBSD.ORG Message-ID: <20030904220709.GR37152@nasby.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.8-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net Subject: Best disk caching method (and PGSQL performance) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 22:07:24 -0000 After reading the thread at http://www.freebsd.org/cgi/getmsg.cgi?fetch=25060+34014+/usr/local/www/db/text/2003/freebsd-performance/20030727.freebsd-performance and the post at http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&selm=199907290910.CAA06665%40implode.root.com I'm a bit confused about how best to get FBSD to cache disk I/O on a PostgreSQL server. Reading the VM article at http://www.daemonnews.org/200001/freebsd_vm.html has certainly made things a bit clearer, but I want to make sure I'm understanding things before I go tweaking stuff. The inactive queue is only comprised of pages that have been read in as mapped memory, correct? I believe PGSQL does not make use of memory-mapped I/O, so does that mean that the only place data reads are being cached is in the buffer pool? (Note that this contradicts the google link). If that's indeed the case, then it seems like the only way to get a decent amount of data caching is by increasing the buffer size (which apparently means increasing kern.nbuf, which also means increasing KVA_PAGES (though I'm not at all sure about this). If this isn't the case, why have the buffer pool at all? Why not just leave buffered disk pages in the inactive or cache queues? On another note, does anyone know for certain that PGSQL on FBSD is fsyncing only the WAL? I'm seeing a pretty large amount of activity on my primary partition and I'm wondering if it's fsyncing everything. Also, has anyone played with the other fsync options? -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Thu Sep 4 15:41:57 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F23A16A4BF for ; Thu, 4 Sep 2003 15:41:57 -0700 (PDT) Received: from perrin.nxad.com (internal.nxad.com [69.1.70.251]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F69A43FB1 for ; Thu, 4 Sep 2003 15:41:57 -0700 (PDT) (envelope-from sean@nxad.com) Received: by perrin.nxad.com (Postfix, from userid 1001) id 722CA20F00; Thu, 4 Sep 2003 15:41:56 -0700 (PDT) Date: Thu, 4 Sep 2003 15:41:56 -0700 From: Sean Chittenden To: "Jim C. Nasby" Message-ID: <20030904224156.GD75041@perrin.nxad.com> References: <20030904220709.GR37152@nasby.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030904220709.GR37152@nasby.net> X-PGP-Key: finger seanc@FreeBSD.org X-PGP-Fingerprint: 3849 3760 1AFE 7B17 11A0 83A6 DD99 E31F BC84 B341 X-Web-Homepage: http://sean.chittenden.org/ User-Agent: Mutt/1.5.4i cc: freebsd-performance@FreeBSD.ORG Subject: Re: Best disk caching method (and PGSQL performance) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2003 22:41:57 -0000 > If that's indeed the case, then it seems like the only way to get a > decent amount of data caching is by increasing the buffer size > (which apparently means increasing kern.nbuf, which also means > increasing KVA_PAGES (though I'm not at all sure about this). You can increase kern.nbuf and even have kern.nbuf available as a sysctl if you apply the following patch: http://people.freebsd.org/~seanc/patches/patch-HEAD-kern.nbuf One piece of advice I have received is, "I use an nbuf of something like twice the default one, and a BKVASIZE of 4 times the default. vfs.maxbufspace ends up at 445MB on the machine with 1GB, so it is maxed out now." As the sagely Mr. Bruce Evans has pointed out to me, buffer kva = nbuf * BKVASIZE, so it's not impossible to figure out what the nbuf level is for a machine, it is nice to not have to poke inside of header files to find the BKVASIZE. > Also, has anyone played with the other fsync options? FreeBSD only supports the default fsync option. -- Sean Chittenden From owner-freebsd-performance@FreeBSD.ORG Fri Sep 5 00:29:34 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EB48416A4BF; Fri, 5 Sep 2003 00:29:33 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9F4643FBD; Fri, 5 Sep 2003 00:29:32 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjg5.dialup.mindspring.com ([165.247.206.5] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19vB25-0005qK-00; Fri, 05 Sep 2003 00:29:26 -0700 Message-ID: <3F583B1F.313B81DE@mindspring.com> Date: Fri, 05 Sep 2003 00:28:31 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Geoff Buckingham References: <3F56F3FD.C636781@mindspring.com> <20030904110155.GA35273@chuggalug.clues.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a461d86e4e29bcea899dae01341e67b08da2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Petri Helenius cc: Max Clark Subject: Re: 20TB Storage System (fsck????) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2003 07:29:34 -0000 Geoff Buckingham wrote: > On Thu, Sep 04, 2003 at 01:12:45AM -0700, Terry Lambert wrote: > > Yes. Limit the number of CG bitmaps you examine simultaneously, > > and make the operation multiple pass over the disk. This is not > > that hard a modification to fsck, and it can be done fairly > > quickly by anyone who understands the code. The code in time to > > fsck the disk will go up inversely proportionally to the amount > > of RAM it's allowed to use, which is limited to the UVA size > > minus the fsck program size itself, and the fsck buffers used for > > things like FS metadata for a given file/directory. > > Pardon my ignorance but does the number of inodes in the filesystem have a > significant impact on the memory requirement of fsck? I can't answer empirically, but extrapolating from the empirical data that I *do* have, the time is going to go up proportional to the number of blocks in use, and the number of blocks in use is going to equal the average number of blocks per file times the number of files, and given that there is one inode per file, you will bound the amount of blocks by bounding the number of inodes. This makes the answer "yes, indirectly". What passes get run really depend on how your FS is configured. By default, a background fsck will only check for blocks that are marked as used in the CG bitmaps that are not actually used; so this is a CG bitmap vs. all inodes direct and indirect block lists consistency check only. Most of the incremental or multipass techniques I've discussed on the mailing list assume either a full fsck, or that you are able to lock individual CGs, or at least ranges on the disk, if you wish to do a BG check; or that you read-only the entire disk until you are done, and maintain a list of "needs update" items (this can be very compact, since it can be run-length encoded or otherwise highly compressed). If you read the fsck manual page and understand what it means, you can get some idea of what parameters effect it in what phases: Inconsistencies checked are as follows: 1. Blocks claimed by more than one inode or the free map. Every inode needs to be scanned to see what blocks in the free map should not be in the free map. The "free map" is the set of bits in the set of all cylinder group bitmaps. Cross-checking multiple references for directories is a process of combinatorial math. You take N inodes 2 at a time and compare them. A trick that is possible, if you are willing to rebuild the CG bitmaps in core, or are willing to double the space for the set you are examining would be to examine a range, and at the same time keep a shadow. Zero the shadow, and pass the list of inodes once, setting bits in the shadow. If you go to set a bit and it's already set, *then* you go back and find out who it was who had the bit set. This is probably an OK trade-off, particularly if you maintain a list of "this-file-this-suspect-bit, and then pass the FS again (large numbers of cross-linked blocks are rare). The second of these operations is as expensive as: #inodes_used*(#inodes_used-1)*(#indirect_blocks**2-1) 2. Blocks claimed by an inode outside the range of the filesystem. What this really should say is "which are outside the range". In other words, bogus block numbers. This is a compare that can be made during a direct linear search. 3. Incorrect link counts. This is a directory entry vs. inode count. The expense of this operation depends on whether you are directory-entry-major or on your pass, and the relative number of directory entries vs. inodes. For most FS's, the number of entries is going to be ~15% higher than the number of inodes; this is because of the hard links to directories from their parents, and to parent directories from their child directories. This number could be much, much higher on an FS with a large number of hard links per file. The thing you have to worry about is tracking the number of hard links per inode, and whether you can do this all in memory (e.g. with a linear array of integers of the same type size as the link count, whose length is equal to the number of inodes available in the system), or whether you have to break the job up and pass over the directory structure multiple times. If you can't keep all the items in memory, and must make multiple passes, then it's better to be inode-major; otherwise, it's better to be directory-major. 4. Size checks: Directory size not a multiple of DIRBLKSIZ. Simple check; can be done during one of the single linear passes. Partially truncated file. Also a linear check, but somewhat harder to handle. 5. Bad inode format. Self-inconsistent contents on inodes. 6. Blocks not accounted for anywhere. Every blocks in the free map that's not there and should be, because it's not claimed by a directory or inode. The "free map" is the set of bits in the set of all cylinder group bitmaps. This is the background fsck on an FS with soft updates case. 7. Directory checks: File pointing to unallocated inode. Directory says it's there, inode say's it's not. Linear pass over the directory space, looking up each inode. Inode number out of range. Linear pass over the directory space, looking at each directory entry. Inode is out of range if it's not in the set of inodes per cylinder group times number of cylinder groups. Directories with unallocated blocks (holes). Directories are not allowed to be sparse, since they are accessed via block I/O, linearly, from first byte to last, in order to scan for matches on lookup/create/rename/iteration operations. Dot or dot-dot not the first two entries of a directory or Internal consistency check. having the wrong inode number. Inode number of child and parent do not match expected corresponding values; this is a one element lookahead hierachical traversal, so it's not quite linear; best handled by depth-first recursive descent. 8. Super Block checks: More blocks for inodes than there are in the filesystem. Bad free block map format. Total free block and/or free inode count incorrect. Trivial checks. Free block/free inode are maintained as part of the other checks, so the superblock is kept in core for the duration. > I ask as it was previously stated the smallest file on the 10TB filessytem > would be 500MB which would enable a vastley reduced number of inodes and > possibly very large block fragment and cluster sizes? The thing that matters is the number of allocated inodes, not the number of total inodes. If they aren't allocated, then they don't have blocks allocated to them, and if the don't have blocks allocated to them, then those blocks don't ned to be checked, and they don't need to be checked. Obviously, the above information is just a brief oversimplification, but it gives you a 50,000 foot view of the issues and trade-offs you could make to fit in less RAM. For more detailed information: The UNIX+ File System Check Program http://citeseer.nj.nec.com/31351.html A Fast File System for UNIX (1984) http://citeseer.nj.nec.com/mckusick84fast.html Basically, if you want to learn, you're going to have to read. 8-). -- Terry From owner-freebsd-performance@FreeBSD.ORG Fri Sep 5 01:27:34 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B30B116A4BF for ; Fri, 5 Sep 2003 01:27:34 -0700 (PDT) Received: from firecrest.mail.pas.earthlink.net (firecrest.mail.pas.earthlink.net [207.217.121.247]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0D05643F93 for ; Fri, 5 Sep 2003 01:27:34 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjg5.dialup.mindspring.com ([165.247.206.5] helo=mindspring.com) by firecrest.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19vBwH-0000MW-00; Fri, 05 Sep 2003 01:27:29 -0700 Message-ID: <3F5848A2.74AAF606@mindspring.com> Date: Fri, 05 Sep 2003 01:26:10 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Sean Chittenden References: <20030904220709.GR37152@nasby.net> <20030904224156.GD75041@perrin.nxad.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a42c1063ce8e83d3f1b0b243854b6ce92c350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@FreeBSD.ORG cc: "Jim C. Nasby" Subject: Re: Best disk caching method (and PGSQL performance) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2003 08:27:34 -0000 Sean Chittenden wrote: > > Also, has anyone played with the other fsync options? > > FreeBSD only supports the default fsync option. And as the comments point out, it lacks the introspection to know dirty pages from clean ones, so all pages that are in core and associated with the object are written, not just the dirty ones. Avoid this, if possible. It would be nice if there were an fcntl that would F_SYNCRANGE or something similar, so the applicaion could hint the range it wanted written to the kernel. -- Terry From owner-freebsd-performance@FreeBSD.ORG Fri Sep 5 06:40:30 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A862016A4BF; Fri, 5 Sep 2003 06:40:30 -0700 (PDT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3F64343FF9; Fri, 5 Sep 2003 06:40:29 -0700 (PDT) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.9/8.12.9) with ESMTP id h85DeCTI010307; Fri, 5 Sep 2003 15:40:18 +0200 (CEST) (envelope-from phk@phk.freebsd.dk) To: David Gilbert From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 05 Sep 2003 09:23:06 EDT." <16216.36410.889440.499438@canoe.velocet.net> Date: Fri, 05 Sep 2003 15:40:12 +0200 Message-ID: <10306.1062769212@critter.freebsd.dk> cc: Petri Helenius cc: Max Clark cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Geoff Buckingham cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2003 13:40:30 -0000 In message <16216.36410.889440.499438@canoe.velocet.net>, David Gilbert writes: >That reminds me... has anyone thought of designing the system to have >more than 8 frags per block? Increasingly, for large file >performance, we're pushing up the block size dramatically. This is >with the assumption that large disks will contain large files. > >It strikes me that driving the block size up (as far as 1M) and having >a 256 (or so) fragments might become appropriate. Sounds like a _great_ project for somebody :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Fri Sep 5 15:03:49 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9627C16A4BF; Fri, 5 Sep 2003 15:03:49 -0700 (PDT) Received: from HAL9000.homeunix.com (12-233-57-131.client.attbi.com [12.233.57.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 804BD44013; Fri, 5 Sep 2003 15:03:48 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.9/8.12.9) with ESMTP id h85M3dG7020167; Fri, 5 Sep 2003 15:03:39 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.9/8.12.9/Submit) id h85M3bxF020166; Fri, 5 Sep 2003 15:03:37 -0700 (PDT) (envelope-from das@FreeBSD.ORG) Date: Fri, 5 Sep 2003 15:03:37 -0700 From: David Schultz To: David Gilbert Message-ID: <20030905220337.GA20142@HAL9000.homeunix.com> Mail-Followup-To: David Gilbert , Poul-Henning Kamp , Petri Helenius , Max Clark , freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Dan Nelson , freebsd-questions@freebsd.org References: <3F5647F3.5080502@he.iki.fi> <64330.1062619621@critter.freebsd.dk> <16216.36410.889440.499438@canoe.velocet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16216.36410.889440.499438@canoe.velocet.net> cc: Poul-Henning Kamp cc: Petri Helenius cc: Max Clark cc: freebsd-hackers@FreeBSD.ORG cc: freebsd-performance@FreeBSD.ORG cc: Dan Nelson cc: freebsd-questions@FreeBSD.ORG Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2003 22:03:49 -0000 On Fri, Sep 05, 2003, David Gilbert wrote: > >>>>> "Poul-Henning" == Poul-Henning Kamp writes: > > Poul-Henning> In message <3F5647F3.5080502@he.iki.fi>, Petri Helenius > Poul-Henning> writes: > >> fsck problem should be gone with less inodes and less blocks since > >> if I read the code correctly, memory is consumed according to used > >> inodes and blocks so having like 20000 inodes and 64k blocks should > >> allow you to build 5-20T filesystem and actually fsck them. > > Poul-Henning> I am not sure I would advocate 64k blocks yet. > > Poul-Henning> I tend to stick with 32k block, 4k fragment myself. > > Poul-Henning> This is a problem which is in the cross-hairs for 6.x > > That reminds me... has anyone thought of designing the system to have > more than 8 frags per block? Increasingly, for large file > performance, we're pushing up the block size dramatically. This is > with the assumption that large disks will contain large files. > > ... but I havn't seem that, myself. Large arrays that we run tend to > have multiple system images (for diskless or semi-diskless operation) > and many more thousands of users ... all with their usual complement > of small files. > > It strikes me that driving the block size up (as far as 1M) and having > a 256 (or so) fragments might become appropriate. > > We probably also need to address disks with larger block sizes soon, > but that's another issue alltogether. To that end, UFS2 is supposed to be able to support ``jumbo blocks''. The code for that isn't in the tree, but I presume Kirk is working on it. From owner-freebsd-performance@FreeBSD.ORG Fri Sep 5 22:04:56 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D780516A4BF; Fri, 5 Sep 2003 22:04:56 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 82B3243F85; Fri, 5 Sep 2003 22:04:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38ldthf.dialup.mindspring.com ([209.86.246.47] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19vVFR-0002RD-00; Fri, 05 Sep 2003 22:04:34 -0700 Message-ID: <3F596AAB.843C86F5@mindspring.com> Date: Fri, 05 Sep 2003 22:03:39 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Gilbert References: <3F5647F3.5080502@he.iki.fi> <16216.36410.889440.499438@canoe.velocet.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4e0ec236727729bd695c827fea927f3d7350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: Poul-Henning Kamp cc: Petri Helenius cc: Max Clark cc: freebsd-hackers@freebsd.org cc: freebsd-performance@freebsd.org cc: Dan Nelson cc: freebsd-questions@freebsd.org Subject: Re: 20TB Storage System X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Sep 2003 05:04:57 -0000 David Gilbert wrote: > >>>>> "Poul-Henning" == Poul-Henning Kamp writes: > Poul-Henning> I am not sure I would advocate 64k blocks yet. > Poul-Henning> I tend to stick with 32k block, 4k fragment myself. > > That reminds me... has anyone thought of designing the system to have > more than 8 frags per block? Increasingly, for large file > performance, we're pushing up the block size dramatically. This is > with the assumption that large disks will contain large files. My assumptions on the previous two statements by Poul are: 1) You cannot trust that a short will be treated as an unsigned 16 bit value in all cases, so values that are between 32768 and 65535 may be treated incorrectly. 2) A fully populate block bitmap byte, which means a divide by 8, is necessary to avoid potential division errors. In other words, he's afraid that the sign bit and/or the block size bitmap used by frags may be treated incorrectly. I have to agree with both those observations. A number of people have, historically, reported issues with a divisor other than 8, and the worry about the sign bit is common sense, given the many historical issues faced by other OS's when it comes to 64K block sizes. -- Terry