Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 08 Dec 2012 13:57:25 -0800
From:      "Ronald F. Guilmette" <rfg@tristatelogic.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: 9.x -- New Install -- serious partition misalignment
Message-ID:  <23146.1355003845@tristatelogic.com>
In-Reply-To: <20121208120658.4d115dc0@X220.ovitrap.com>

next in thread | previous in thread | raw e-mail | index | archive | help

In message <20121208120658.4d115dc0@X220.ovitrap.com>, 
Erich Dollansky <erich@alogt.com> wrote:

>Hi,
>
>On Fri, 07 Dec 2012 16:18:57 -0800
>"Ronald F. Guilmette" <rfg@tristatelogic.com> wrote:
>
>> If possibility (c) applies, then I would also like to know if anybody
>> has any suggestions for how I might be able to get this problem
>> escalated so that (hopefully) it gets dealt with before 9.1-RELEASE
>> is finalized.
>
>162 / 4 is 40.5. So, disk access on modern disks will be slower but not
>catastrophic.

Actually, it appears to me that the performance hit _will_ actually be
rather entirely catastrophic.  (But obviously it depends on what you are
doing with the system in question.  If you are just playing tetris on it,
then no worries mate.  But if you're doing some hardcore database stuff,
you could be majorly screwed.)

Take a simple example of an attempt to perform a single write+flush of
a 4 KiB block to a location which is aligned to a 4 KiB boundary...
relative to the start of the partition.... in each of two different
partitions on two different drives.  Assume that the fragment size
(-f option for newfs) has been set to 4 KiB for both partitions, and
that both partitions reside on so-called "Advanced Format" (4 KiB
physical sector) drives.  The only difference is that for one of these
two partitions, the partition is NOT properly aligned.

Scenario #1)

In the case of the properly aligned partition, the kernel, knowing that
the fragment size is 4 KiB, can simply write the (properly aligned)
4 KiB hunk of data to the drive and this will result is a single physical
write operation on the drive.

Scenario #2)

Contrast that now with the case where the start of the partition is misaligned.
In this case, because the kernel knows that the partition into which we are
doing the write has a 4 KiB fragment size, and because it knows that the
hunk of data we are writing is perfectly aligned to a 4 KiB boundary within
that partition, the kernel again believes that it can still just push the
new 4 Kib hunk of data out to the drive in a single operation... and it does.
Inside the physical drive however, because the partition is misaligned,
the results will be _two_ reads (of two adjacent physical 4 KiB sectors)
followed by _two_ writes (of two adjacent physical 4 KiB sectors).

Congratulations!  Your perfectly tuned DB application now takes four times
as long for each simple write operation.  Performance has been degraded by
a whopping 75% !

This is NOT an insignificant or trivial matter.  In fact this is a screw up
of major proportions, I do believe.

This probably wouldn't be such a big deal if we were just talking about
Linux.  But FreeBSD has always prided itself on being a serious OS for
serious people with serious work to do... like major server farms and
such.  In the context of high-end applications on high-end hardware where
people are often trying to squeeze out that last drop of performance,
the potentially massive... and stealthy... performance hit from partition
misalignment... which will be seen/experienced on essentially all modern
terabyte+ drives... is just simply unacceptable.  Well, that's my opinion
anyway.


Regards,
rfg


P.S.  Warren Block was kind enough to point out to me that he had already
filesd a formal PR on this enormously serious bug way back in October of
last year (2011):

  http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/161720

and yet nothing has been done about it in all of the intervening 14 months.

Apparently, the party most directly responsible for bsdinstall is Nathan
Whitehorn, and judging from the date stamps on the material here:

   http://people.freebsd.org/~nwhitehorn/

a reasonable person might concude, with all due respect, that Mr. Whitehorn
may in fact have "run down the curtain and joined the bleedin' choir
invisibile" sometime during the month of April, 2011.

I'll be e-mailing him momentarily to try to find out, one way or the other.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?23146.1355003845>