From owner-freebsd-arch@FreeBSD.ORG Fri Jan 1 11:25:34 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AAA031065696; Fri, 1 Jan 2010 11:25:34 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 6C2198FC17; Fri, 1 Jan 2010 11:25:34 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id B6A4F7E9E6; Fri, 1 Jan 2010 11:25:32 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.3/8.14.3) with ESMTP id o01BQAkQ061365; Fri, 1 Jan 2010 11:26:10 GMT (envelope-from phk@critter.freebsd.dk) To: Pieter de Goeje From: "Poul-Henning Kamp" In-Reply-To: Your message of "Thu, 31 Dec 2009 17:12:17 +0100." <200912311712.18347.pieter@degoeje.nl> Date: Fri, 01 Jan 2010 11:26:10 +0000 Message-ID: <61364.1262345170@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Alexander Motin , freebsd-current@freebsd.org, Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-arch@freebsd.org Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 11:25:34 -0000 In message <200912311712.18347.pieter@degoeje.nl>, Pieter de Goeje writes: >Ok, as Miroslav wrote, it does not report 4k sectors: If you care to, and have the time, try the following: for N in (0...7) for M in (0..2) create partition starting at byte offset N*512 newfs -f 4096 -b 32768 /dev/foo mount /dev/foo /mnt time restore(8) a filesystem into it. unmount /mnt time fsck_ffs /dev/foo mount /dev/foo /mnt time tar cf /dev/null /mnt unmount /mnt -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Fri Jan 1 20:54:06 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 271351065679; Fri, 1 Jan 2010 20:54:06 +0000 (UTC) (envelope-from pieter@degoeje.nl) Received: from mx.utwente.nl (mx1.utsp.utwente.nl [130.89.2.12]) by mx1.freebsd.org (Postfix) with ESMTP id 9CC958FC0A; Fri, 1 Jan 2010 20:54:05 +0000 (UTC) Received: from nox.student.utwente.nl (nox.student.utwente.nl [130.89.165.91]) by mx.utwente.nl (8.12.10/SuSE Linux 0.7) with ESMTP id o01Kripo006131; Fri, 1 Jan 2010 21:53:46 +0100 From: Pieter de Goeje To: "Poul-Henning Kamp" Date: Fri, 1 Jan 2010 21:53:43 +0100 User-Agent: KMail/1.9.10 References: <61364.1262345170@critter.freebsd.dk> In-Reply-To: <61364.1262345170@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201001012153.44349.pieter@degoeje.nl> X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact icts.servicedesk@utwente.nl for more information. X-UTwente-MailScanner: Found to be clean X-UTwente-MailScanner-From: pieter@degoeje.nl X-Spam-Status: No Cc: Alexander Motin , freebsd-current@freebsd.org, Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-arch@freebsd.org Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 20:54:06 -0000 On Friday 01 January 2010 12:26:10 Poul-Henning Kamp wrote: > In message <200912311712.18347.pieter@degoeje.nl>, Pieter de Goeje writes: > >Ok, as Miroslav wrote, it does not report 4k sectors: > > If you care to, and have the time, try the following: > > > for N in (0...7) > for M in (0..2) > create partition starting at byte offset N*512 > newfs -f 4096 -b 32768 /dev/foo > mount /dev/foo /mnt > time restore(8) a filesystem into it. > unmount /mnt > time fsck_ffs /dev/foo > mount /dev/foo /mnt > time tar cf /dev/null /mnt > unmount /mnt That yielded some pretty spectacular results. When the partition is not correctly aligned, performance is not so good ;-). About the benchmark: - Test was conducted in single user mode, while running gstat and systat to monitor the system. - OS and dump archive were located on a different harddisk. - Partition size was 10GB, dump archive ~550MB (dump of /). The archive easily fitted in main memory (8GB). - I observed L(q) sizes over 600 and ms/w times over 30sec during unaligned restore. - CPU load was very low during entire test. Performance for restore was abysmal in the unaligned case, easily being 10 times slower than aligned restore. Newfs was about 5 times as slow. I also tested various combinations of soft-updates, mounting async and block/fragment size. Some helped mildly as one might expect, but nothing significant was gained. It was somewhat surprising to see restore being faster than tar in the aligned case though. Mounting async or using soft updates resulted in a 6sec restore, while tar took about 8sec. Raw results (and benchmark script) below. Offset is the number of (512byte) sectors. The partition is aligned at offset 40. The test begins at offset 34 because that is the first free sector for GPT partitions. --------------- results --------------- ==> offset: 34 iter: 1 newfs 0.98 real 0.00 user 0.00 sys restore 117.76 real 0.12 user 0.98 sys fsck_ffs 1.83 real 0.02 user 0.00 sys tar 9.83 real 0.06 user 0.34 sys ==> offset: 34 iter: 2 newfs 0.98 real 0.00 user 0.00 sys restore 149.68 real 0.15 user 1.02 sys fsck_ffs 1.75 real 0.02 user 0.00 sys tar 9.56 real 0.02 user 0.37 sys ==> offset: 35 iter: 1 newfs 0.95 real 0.00 user 0.00 sys restore 135.31 real 0.11 user 1.07 sys fsck_ffs 1.82 real 0.02 user 0.00 sys tar 9.74 real 0.04 user 0.35 sys ==> offset: 35 iter: 2 newfs 1.21 real 0.00 user 0.00 sys restore 154.59 real 0.13 user 1.05 sys fsck_ffs 1.91 real 0.02 user 0.00 sys tar 9.58 real 0.04 user 0.35 sys ==> offset: 36 iter: 1 newfs 0.98 real 0.00 user 0.00 sys restore 151.36 real 0.11 user 1.08 sys fsck_ffs 1.77 real 0.02 user 0.00 sys tar 9.86 real 0.07 user 0.32 sys ==> offset: 36 iter: 2 newfs 0.95 real 0.00 user 0.00 sys restore 153.71 real 0.14 user 1.05 sys fsck_ffs 1.90 real 0.02 user 0.00 sys tar 10.02 real 0.03 user 0.37 sys ==> offset: 37 iter: 1 newfs 0.98 real 0.00 user 0.00 sys restore 128.90 real 0.08 user 1.10 sys fsck_ffs 1.76 real 0.03 user 0.00 sys tar 9.76 real 0.08 user 0.31 sys ==> offset: 37 iter: 2 newfs 0.98 real 0.00 user 0.00 sys restore 147.45 real 0.13 user 1.05 sys fsck_ffs 1.83 real 0.02 user 0.00 sys tar 9.75 real 0.06 user 0.33 sys ==> offset: 38 iter: 1 newfs 0.95 real 0.00 user 0.00 sys restore 150.35 real 0.15 user 1.04 sys fsck_ffs 2.04 real 0.02 user 0.01 sys tar 9.42 real 0.04 user 0.35 sys ==> offset: 38 iter: 2 newfs 0.94 real 0.00 user 0.00 sys restore 125.40 real 0.16 user 1.02 sys fsck_ffs 1.69 real 0.02 user 0.00 sys tar 9.74 real 0.09 user 0.30 sys ==> offset: 39 iter: 1 newfs 0.95 real 0.01 user 0.00 sys restore 135.23 real 0.11 user 1.09 sys fsck_ffs 1.93 real 0.02 user 0.01 sys tar 9.82 real 0.05 user 0.35 sys ==> offset: 39 iter: 2 newfs 0.93 real 0.01 user 0.00 sys restore 150.30 real 0.07 user 1.12 sys fsck_ffs 1.86 real 0.02 user 0.01 sys tar 9.61 real 0.08 user 0.32 sys ==> offset: 40 iter: 1 newfs 0.22 real 0.00 user 0.00 sys restore 11.05 real 0.10 user 1.08 sys fsck_ffs 1.97 real 0.02 user 0.00 sys tar 7.86 real 0.05 user 0.34 sys ==> offset: 40 iter: 2 newfs 0.17 real 0.00 user 0.00 sys restore 8.83 real 0.09 user 1.09 sys fsck_ffs 1.67 real 0.01 user 0.01 sys tar 7.86 real 0.08 user 0.31 sys ==> offset: 41 iter: 1 newfs 0.95 real 0.01 user 0.00 sys restore 133.37 real 0.16 user 1.04 sys fsck_ffs 1.71 real 0.02 user 0.00 sys tar 9.43 real 0.07 user 0.31 sys ==> offset: 41 iter: 2 newfs 1.09 real 0.00 user 0.01 sys restore 150.71 real 0.09 user 1.10 sys fsck_ffs 2.01 real 0.01 user 0.01 sys tar 9.25 real 0.06 user 0.33 sys ------------------- bench.sh ---------------- disk=/dev/ada0 fsarchive=/usr/home/pyotr/rootdump results=/usr/home/pyotr/bench-results bench() { offset=$((34 + $1)) echo "==> offset: $offset iter: $2" >> $results dd if=/dev/zero of=$disk bs=1m count=1 2> /dev/null gpart create -s gpt $disk gpart add -b $offset -s 10G -t freebsd-ufs $disk partition=${disk}p1 echo "newfs" >> $results time -ao $results newfs -f 4096 -b 32768 $partition || exit 1 mount $partition /mnt cd /mnt echo "restore" >> $results time -ao $results restore -rf $fsarchive || exit 1 cd / umount /mnt echo "fsck_ffs" >> $results time -ao $results fsck_ffs $partition || exit 1 mount $partition /mnt echo "tar" >> $results time -ao $results tar -cf /dev/null /mnt || exit 1 umount /mnt } true > $results umount /mnt for N in `jot 8 0`; do for M in `jot 2`; do bench $N $M done done -- Pieter de Goeje From owner-freebsd-arch@FreeBSD.ORG Fri Jan 1 22:46:36 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82B2F1065697; Fri, 1 Jan 2010 22:46:36 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 47ADA8FC25; Fri, 1 Jan 2010 22:46:35 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id DC62A7E996; Fri, 1 Jan 2010 22:46:34 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.3/8.14.3) with ESMTP id o01MlCVb065037; Fri, 1 Jan 2010 22:47:12 GMT (envelope-from phk@critter.freebsd.dk) To: Pieter de Goeje From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 01 Jan 2010 21:53:43 +0100." <201001012153.44349.pieter@degoeje.nl> Date: Fri, 01 Jan 2010 22:47:12 +0000 Message-ID: <65036.1262386032@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Alexander Motin , freebsd-current@freebsd.org, Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-arch@freebsd.org Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 22:46:36 -0000 In message <201001012153.44349.pieter@degoeje.nl>, Pieter de Goeje writes: >That yielded some pretty spectacular results. [...] > >Performance for restore was abysmal in the unaligned case, easily being 10 >times slower than aligned restore. Newfs was about 5 times as slow. That is what I expected, only I didn't expect a factor 14 in performance. I'm not surprised that newfs and restore take the biggest hits in that test, those are the hard ones, seen from the disk drive, all the read only work can be cached and "covered up" that way. Ideally, newfs/UFS should do a quick test to look for any obvious boundaries, and DTRT, a nice little task for somebody :-) Poul-Henning PS: The reason I asked for 3 iterations, was so we could calculate a standard deviation (See: ministat(8)) in order to have a statistical sound conclusion. With a factor 14 in time difference, I will for once conceede it unnecessary :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 00:01:29 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 048FF106566B; Sat, 2 Jan 2010 00:01:29 +0000 (UTC) (envelope-from pieter@degoeje.nl) Received: from smtp.utwente.nl (smtp1.utsp.utwente.nl [130.89.2.8]) by mx1.freebsd.org (Postfix) with ESMTP id 7D7108FC15; Sat, 2 Jan 2010 00:01:27 +0000 (UTC) Received: from nox.student.utwente.nl (nox.student.utwente.nl [130.89.165.91]) by smtp.utwente.nl (8.12.10/SuSE Linux 0.7) with ESMTP id o0201GdU003251; Sat, 2 Jan 2010 01:01:16 +0100 From: Pieter de Goeje To: freebsd-arch@freebsd.org Date: Sat, 2 Jan 2010 01:01:16 +0100 User-Agent: KMail/1.9.10 References: <65036.1262386032@critter.freebsd.dk> In-Reply-To: <65036.1262386032@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201001020101.16450.pieter@degoeje.nl> X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact icts.servicedesk@utwente.nl for more information. X-UTwente-MailScanner: Found to be clean X-UTwente-MailScanner-From: pieter@degoeje.nl X-Spam-Status: No Cc: Poul-Henning Kamp , freebsd-current@freebsd.org, Alexander Motin , Thomas Backman , Miroslav Lachman <000.fbsd@quip.cz> Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 00:01:29 -0000 On Friday 01 January 2010 23:47:12 Poul-Henning Kamp wrote: > In message <201001012153.44349.pieter@degoeje.nl>, Pieter de Goeje writes: > >That yielded some pretty spectacular results. [...] > > > >Performance for restore was abysmal in the unaligned case, easily being 10 > >times slower than aligned restore. Newfs was about 5 times as slow. > > That is what I expected, only I didn't expect a factor 14 in performance. I'm trying to think of reasons why it performs so poorly, because even in the "write covers two sectors partially" case a read-modify-write cycle shouldn't mess performance up so badly, considering the drive's huge 64MB cache. Maybe the firmware is just not that smart. > > I'm not surprised that newfs and restore take the biggest hits in that > test, those are the hard ones, seen from the disk drive, all the read > only work can be cached and "covered up" that way. > > Ideally, newfs/UFS should do a quick test to look for any obvious > boundaries, and DTRT, a nice little task for somebody :-) A search for the offset for which newfs (or a simpler test) runs fastest? Interesting idea :-) Technically, the drive's at fault here because it should've reported 4K sectors. Perhaps there should be some kind of quirks table :-S for disk drives and/or a sectorsize override knob. Or maybe simply selecting a large enough power-of-two boundary suffices. That could also be done by gpart instead of (or in addition to) newfs. > > Poul-Henning > > PS: The reason I asked for 3 iterations, was so we could calculate > a standard deviation (See: ministat(8)) in order to have a statistical > sound conclusion. With a factor 14 in time difference, I will for > once conceede it unnecessary :-) Noted :-) - Pieter From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 00:03:24 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13FDE106568B for ; Sat, 2 Jan 2010 00:03:24 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id BDF138FC08 for ; Sat, 2 Jan 2010 00:03:23 +0000 (UTC) Received: from [IPv6:::1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.2/8.14.2) with ESMTP id o020334j063295; Fri, 1 Jan 2010 17:03:03 -0700 (MST) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1076) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes From: Scott Long In-Reply-To: <65036.1262386032@critter.freebsd.dk> Date: Fri, 1 Jan 2010 17:03:03 -0700 Content-Transfer-Encoding: 7bit Message-Id: <925A0DA7-5D9B-41FA-B586-6C128F816C58@samsco.org> References: <65036.1262386032@critter.freebsd.dk> To: Poul-Henning Kamp X-Mailer: Apple Mail (2.1076) X-Spam-Status: No, score=-4.2 required=3.8 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Alexander Motin , Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-current@freebsd.org, freebsd-arch@freebsd.org, Pieter de Goeje Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 00:03:24 -0000 On Jan 1, 2010, at 3:47 PM, Poul-Henning Kamp wrote: > In message <201001012153.44349.pieter@degoeje.nl>, Pieter de Goeje > writes: > >> That yielded some pretty spectacular results. [...] >> >> Performance for restore was abysmal in the unaligned case, easily >> being 10 >> times slower than aligned restore. Newfs was about 5 times as slow. > > That is what I expected, only I didn't expect a factor 14 in > performance. > It's all about read latency in the read-modify update operation. While buses and caches and gotten steadily faster over the past 20 years, disk platters and hysteresis fields have not. This is also why buying faster platters is always an important consideration for overall performance; a desktop or laptop with 5400RPM drives will feel significantly slower than one with 7200RPM drives, and 15K RPM drives still rule the roost. Thanks a lot for doing the testing. Would it be possible to publish these results somewhere that can be linked to in the future? Scott From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 01:53:38 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FCDE106566B; Sat, 2 Jan 2010 01:53:38 +0000 (UTC) (envelope-from james-freebsd-current@jrv.org) Received: from mail.jrv.org (adsl-70-243-84-13.dsl.austtx.swbell.net [70.243.84.13]) by mx1.freebsd.org (Postfix) with ESMTP id 890CF8FC18; Sat, 2 Jan 2010 01:53:38 +0000 (UTC) Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124]) by mail.jrv.org (8.14.3/8.14.3) with ESMTP id o021raZm036346; Fri, 1 Jan 2010 19:53:36 -0600 (CST) (envelope-from james-freebsd-current@jrv.org) Authentication-Results: mail.jrv.org; domainkeys=pass (testing) header.from=james-freebsd-current@jrv.org DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=mPjn7dhkCqKAkbRph8w7cXrcO0kocS4mK6hNwE1fZUEbP92zyDlKtd9ccOB/RNffg AnfY42bM3x+hujXnBEXpr8ksm1tFyrPhX3+lJbg09mHagDycXHbJ//qE+nwtpLNTRas +Au5gy1lCVKOJC0X62BuJpA4ScojUkNgHssl2+Y= Message-ID: <4B3EA720.8070803@jrv.org> Date: Fri, 01 Jan 2010 19:53:36 -0600 From: "James R. Van Artsdalen" User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Scott Long References: <65036.1262386032@critter.freebsd.dk> <925A0DA7-5D9B-41FA-B586-6C128F816C58@samsco.org> In-Reply-To: <925A0DA7-5D9B-41FA-B586-6C128F816C58@samsco.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 01:53:38 -0000 Scott Long wrote: > It's all about read latency in the read-modify update operation. In this case it's about rotational latency - read/modify/write necessarily adds a full rotation onto the I/O time, above and beyond the read latency. At 5400 rpm that's 11 ms added. To an extent this can be masked by deferring the write and doing other I/O until the disk rotates to a favorable position (a short seek is faster than a rotation). But the cache is only so big and the drive may have to take the hit and do the writes even at an unfavorable point if something like newfs saturates the cache. I am surprised the hit is so bad. I guess we'd have to look at a trace-tape of the I/O but I would have expected many of the writes (after r/m/w) from newfs and restore to be combined in each cg, and many reads eliminated due to data already being cached via speculative reads (while waiting for the disk to rotate for a write the drive might as well read what's passing under the head and cache it). (of course, the bad hit may just be the result of v1.0 firmware, or on-drive processors that turned out to be too slow for efficient r/m/w algorithms under real-world loads) Faster rotation rates win, but the bit-density goes down at higher speeds. So a 15k SAS drive may be faster than a 7200 rpm but it's nowhere near twice as fast. From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 05:08:50 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8806106566B for ; Sat, 2 Jan 2010 05:08:50 +0000 (UTC) (envelope-from uqs@spoerlein.net) Received: from acme.spoerlein.net (acme.spoerlein.net [IPv6:2a01:198:206::1]) by mx1.freebsd.org (Postfix) with ESMTP id 12D228FC0C for ; Sat, 2 Jan 2010 05:08:49 +0000 (UTC) Received: from acme.spoerlein.net (localhost.spoerlein.net [IPv6:::1]) by acme.spoerlein.net (8.14.3/8.14.3) with ESMTP id o0258iPY026360 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 2 Jan 2010 06:08:44 +0100 (CET) (envelope-from uqs@spoerlein.net) Received: (from uqs@localhost) by acme.spoerlein.net (8.14.3/8.14.3/Submit) id o0258iFN026359; Sat, 2 Jan 2010 06:08:44 +0100 (CET) (envelope-from uqs@spoerlein.net) Date: Sat, 2 Jan 2010 06:08:44 +0100 From: Ulrich =?utf-8?B?U3DDtnJsZWlu?= To: Poul-Henning Kamp Message-ID: <20100102050843.GI3508@acme.spoerlein.net> Mail-Followup-To: Poul-Henning Kamp , Pieter de Goeje , Alexander Motin , freebsd-current@freebsd.org, Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-arch@freebsd.org References: <201001012153.44349.pieter@degoeje.nl> <65036.1262386032@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <65036.1262386032@critter.freebsd.dk> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Alexander Motin , Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-current@freebsd.org, freebsd-arch@freebsd.org, Pieter de Goeje Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 05:08:50 -0000 On Fri, 01.01.2010 at 22:47:12 +0000, Poul-Henning Kamp wrote: > In message <201001012153.44349.pieter@degoeje.nl>, Pieter de Goeje writes: > > >That yielded some pretty spectacular results. [...] > > > >Performance for restore was abysmal in the unaligned case, easily being 10 > >times slower than aligned restore. Newfs was about 5 times as slow. > > That is what I expected, only I didn't expect a factor 14 in performance. > > I'm not surprised that newfs and restore take the biggest hits in that > test, those are the hard ones, seen from the disk drive, all the read > only work can be cached and "covered up" that way. > > Ideally, newfs/UFS should do a quick test to look for any obvious > boundaries, and DTRT, a nice little task for somebody :-) Indeed, but newfs is only one small part of the puzzle. Think about zpools and, more importantly swap partitions. Sysinstall, fdisk, gpart and bsdlabel should all display some fat warning if partition/label alignment is not, say at 256kB (a common stripe size, right?) and also automatically generate that offset if the user uses automatic settings. But then again, this is all wishful thinking from a users perspective :) Regards, Uli From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 09:41:58 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CF23106566B; Sat, 2 Jan 2010 09:41:58 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 0E6528FC13; Sat, 2 Jan 2010 09:41:57 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 533897E98F; Sat, 2 Jan 2010 09:41:56 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.3/8.14.3) with ESMTP id o029gYlV067021; Sat, 2 Jan 2010 09:42:34 GMT (envelope-from phk@critter.freebsd.dk) To: Ulrich =?utf-8?B?U3DDtnJsZWlu?= From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sat, 02 Jan 2010 06:08:44 +0100." <20100102050843.GI3508@acme.spoerlein.net> Date: Sat, 02 Jan 2010 09:42:34 +0000 Message-ID: <67020.1262425354@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Alexander Motin , Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-current@freebsd.org, freebsd-arch@freebsd.org, Pieter de Goeje Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 09:41:58 -0000 In message <20100102050843.GI3508@acme.spoerlein.net>, Ulrich =?utf-8?B?U3DDtnJ sZWlu?= writes: >Sysinstall, fdisk, gpart >and bsdlabel should all display some fat warning if partition/label >alignment is not, say at 256kB (a common stripe size, right?) You overlook that MBR/Fdisk requires bootable slices to start at a "track". That means that the propper slice-alignmen typically will be 8*63=504 sectors. Unless you want to explore how many BIOS'es still are stupid about this... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 22:22:53 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4829A1065692 for ; Sat, 2 Jan 2010 22:22:53 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id C71038FC14 for ; Sat, 2 Jan 2010 22:22:51 +0000 (UTC) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o02L1Lmh021271 for ; Sun, 3 Jan 2010 08:01:21 +1100 Received: from server.vk2pj.dyndns.org (c122-106-232-83.belrs3.nsw.optusnet.com.au [122.106.232.83]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o02L17oY009485 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 3 Jan 2010 08:01:09 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o02L0wIj039408; Sun, 3 Jan 2010 08:00:58 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o02L0vp5039406; Sun, 3 Jan 2010 08:00:57 +1100 (EST) (envelope-from peter) Date: Sun, 3 Jan 2010 08:00:57 +1100 From: Peter Jeremy To: Pieter de Goeje Message-ID: <20100102210056.GD32012@server.vk2pj.dyndns.org> References: <61364.1262345170@critter.freebsd.dk> <201001012153.44349.pieter@degoeje.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="x4pBfXISqBoDm8sr" Content-Disposition: inline In-Reply-To: <201001012153.44349.pieter@degoeje.nl> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Poul-Henning Kamp , freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 22:22:53 -0000 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Thanks for that testing - the results are quite enlightening. Given that the PC architecture forces unaligned partition boundaries by default, these drives are likely to wind up with a bad reputation for performance. On 2010-Jan-01 21:53:43 +0100, Pieter de Goeje wrote: >- Partition size was 10GB, dump archive ~550MB (dump of /). The archive ea= sily >fitted in main memory (8GB). =2E.. >It was somewhat surprising to see restore being faster than tar in the ali= gned >case though. Mounting async or using soft updates resulted in a 6sec resto= re,=20 >while tar took about 8sec. With async or softupdates, there is likely to be virtually no physical write activity during the restore, with all the data winding up in the buffer cache to be flushed during the unmount. IMO, the 'restore' timing should include the unmount, changing: > cd /mnt > echo "restore" >> $results > time -ao $results restore -rf $fsarchive || exit 1 > cd / > umount /mnt > echo "fsck_ffs" >> $results to: > echo "restore" >> $results > time -ao $results sh -c 'cd /mnt && restore -rf $fsarchive && cd / && unm= ount /mnt' || exit 1 > echo "fsck_ffs" >> $results --=20 Peter Jeremy --x4pBfXISqBoDm8sr Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAks/tAgACgkQ/opHv/APuIdY1wCgqCqdnZxuEidkv1lpZ5dn7vRL cDEAn2Qvq+H/kFxI2YheKWR5PWsHWvHh =+Tvm -----END PGP SIGNATURE----- --x4pBfXISqBoDm8sr-- From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 22:28:52 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 576451065694 for ; Sat, 2 Jan 2010 22:28:52 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outI.internet-mail-service.net (outi.internet-mail-service.net [216.240.47.232]) by mx1.freebsd.org (Postfix) with ESMTP id 399328FC1C for ; Sat, 2 Jan 2010 22:28:52 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 2225222C3; Sat, 2 Jan 2010 14:28:52 -0800 (PST) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137]) by idiom.com (Postfix) with ESMTP id E2B842D6013; Sat, 2 Jan 2010 14:28:50 -0800 (PST) Message-ID: <4B3FC8A2.1090901@elischer.org> Date: Sat, 02 Jan 2010 14:28:50 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Poul-Henning Kamp References: <67020.1262425354@critter.freebsd.dk> In-Reply-To: <67020.1262425354@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: =?ISO-8859-1?Q?Ulrich_Sp=F6rlein?= , Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-current@freebsd.org, freebsd-arch@freebsd.org, Pieter de Goeje , Alexander Motin Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 22:28:52 -0000 Poul-Henning Kamp wrote: > In message <20100102050843.GI3508@acme.spoerlein.net>, Ulrich =?utf-8?B?U3DDtnJ > sZWlu?= writes: > >> Sysinstall, fdisk, gpart >> and bsdlabel should all display some fat warning if partition/label >> alignment is not, say at 256kB (a common stripe size, right?) > > You overlook that MBR/Fdisk requires bootable slices to start at a > "track". That means that the propper slice-alignmen typically > will be 8*63=504 sectors. No it doesn't, (or at least it didn't) but it has become custom to do so. You could always put your slice anywhere (within the stupid geometry constraints) but the ENDING cylinder/head/block numbers were taken to be at the END of a cylinder when there were no other ways of working out the geometry, so that you could "infer" the geometry. packet enabled BIOS's made it all go away. It was all forced on us by IBM. > > Unless you want to explore how many BIOS'es still are stupid about > this... > From owner-freebsd-arch@FreeBSD.ORG Sat Jan 2 22:36:46 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7840B106566C; Sat, 2 Jan 2010 22:36:46 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 392118FC14; Sat, 2 Jan 2010 22:36:45 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 94A167E831; Sat, 2 Jan 2010 22:36:43 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.3/8.14.3) with ESMTP id o02MbLKR071178; Sat, 2 Jan 2010 22:37:21 GMT (envelope-from phk@critter.freebsd.dk) To: Julian Elischer From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sat, 02 Jan 2010 14:28:50 PST." <4B3FC8A2.1090901@elischer.org> Date: Sat, 02 Jan 2010 22:37:21 +0000 Message-ID: <71177.1262471841@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: =?ISO-8859-1?Q?Ulrich_Sp=F6rlein?= , Miroslav Lachman <000.fbsd@quip.cz>, Thomas Backman , freebsd-current@freebsd.org, freebsd-arch@freebsd.org, Pieter de Goeje , Alexander Motin Subject: Re: File system blocks alignment X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 22:36:46 -0000 In message <4B3FC8A2.1090901@elischer.org>, Julian Elischer writes: >Poul-Henning Kamp wrote: >> You overlook that MBR/Fdisk requires bootable slices to start at a >> "track". That means that the propper slice-alignmen typically >> will be 8*63=504 sectors. > >No it doesn't, (or at least it didn't) but it has become custom to do so. Yes it does, for all slices not starting on the first head. We've been over this maddness in the past multiple times. If somebody is willing to suffer this breakage on funky old bios'es, betting that most of those systems will never run FreeBSD-9, I would say they are being emminently sensible, but still have their work cut out for them. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.