From owner-freebsd-hackers@FreeBSD.ORG Sat Dec 8 21:57:27 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A7251AB1 for ; Sat, 8 Dec 2012 21:57:27 +0000 (UTC) (envelope-from rfg@tristatelogic.com) Received: from outgoing.tristatelogic.com (segfault.tristatelogic.com [69.62.255.118]) by mx1.freebsd.org (Postfix) with ESMTP id 5BDBD8FC13 for ; Sat, 8 Dec 2012 21:57:26 +0000 (UTC) Received: from segfault-nmh-helo.tristatelogic.com (localhost [127.0.0.1]) by segfault.tristatelogic.com (Postfix) with ESMTP id 6A0625081A for ; Sat, 8 Dec 2012 13:57:25 -0800 (PST) To: freebsd-hackers@freebsd.org Subject: Re: 9.x -- New Install -- serious partition misalignment In-Reply-To: <20121208120658.4d115dc0@X220.ovitrap.com> Date: Sat, 08 Dec 2012 13:57:25 -0800 Message-ID: <23146.1355003845@tristatelogic.com> From: "Ronald F. Guilmette" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Dec 2012 21:57:27 -0000 In message <20121208120658.4d115dc0@X220.ovitrap.com>, Erich Dollansky wrote: >Hi, > >On Fri, 07 Dec 2012 16:18:57 -0800 >"Ronald F. Guilmette" wrote: > >> If possibility (c) applies, then I would also like to know if anybody >> has any suggestions for how I might be able to get this problem >> escalated so that (hopefully) it gets dealt with before 9.1-RELEASE >> is finalized. > >162 / 4 is 40.5. So, disk access on modern disks will be slower but not >catastrophic. Actually, it appears to me that the performance hit _will_ actually be rather entirely catastrophic. (But obviously it depends on what you are doing with the system in question. If you are just playing tetris on it, then no worries mate. But if you're doing some hardcore database stuff, you could be majorly screwed.) Take a simple example of an attempt to perform a single write+flush of a 4 KiB block to a location which is aligned to a 4 KiB boundary... relative to the start of the partition.... in each of two different partitions on two different drives. Assume that the fragment size (-f option for newfs) has been set to 4 KiB for both partitions, and that both partitions reside on so-called "Advanced Format" (4 KiB physical sector) drives. The only difference is that for one of these two partitions, the partition is NOT properly aligned. Scenario #1) In the case of the properly aligned partition, the kernel, knowing that the fragment size is 4 KiB, can simply write the (properly aligned) 4 KiB hunk of data to the drive and this will result is a single physical write operation on the drive. Scenario #2) Contrast that now with the case where the start of the partition is misaligned. In this case, because the kernel knows that the partition into which we are doing the write has a 4 KiB fragment size, and because it knows that the hunk of data we are writing is perfectly aligned to a 4 KiB boundary within that partition, the kernel again believes that it can still just push the new 4 Kib hunk of data out to the drive in a single operation... and it does. Inside the physical drive however, because the partition is misaligned, the results will be _two_ reads (of two adjacent physical 4 KiB sectors) followed by _two_ writes (of two adjacent physical 4 KiB sectors). Congratulations! Your perfectly tuned DB application now takes four times as long for each simple write operation. Performance has been degraded by a whopping 75% ! This is NOT an insignificant or trivial matter. In fact this is a screw up of major proportions, I do believe. This probably wouldn't be such a big deal if we were just talking about Linux. But FreeBSD has always prided itself on being a serious OS for serious people with serious work to do... like major server farms and such. In the context of high-end applications on high-end hardware where people are often trying to squeeze out that last drop of performance, the potentially massive... and stealthy... performance hit from partition misalignment... which will be seen/experienced on essentially all modern terabyte+ drives... is just simply unacceptable. Well, that's my opinion anyway. Regards, rfg P.S. Warren Block was kind enough to point out to me that he had already filesd a formal PR on this enormously serious bug way back in October of last year (2011): http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/161720 and yet nothing has been done about it in all of the intervening 14 months. Apparently, the party most directly responsible for bsdinstall is Nathan Whitehorn, and judging from the date stamps on the material here: http://people.freebsd.org/~nwhitehorn/ a reasonable person might concude, with all due respect, that Mr. Whitehorn may in fact have "run down the curtain and joined the bleedin' choir invisibile" sometime during the month of April, 2011. I'll be e-mailing him momentarily to try to find out, one way or the other.