From owner-freebsd-chat  Sun Jan 27 16:59:13 2002
Delivered-To: freebsd-chat@freebsd.org
Received: from scaup.prod.itd.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49])
	by hub.freebsd.org (Postfix) with ESMTP id 3E88B37B402
	for <freebsd-chat@freebsd.org>; Sun, 27 Jan 2002 16:59:06 -0800 (PST)
Received: from pool0437.cvx22-bradley.dialup.earthlink.net ([209.179.199.182] helo=mindspring.com)
	by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16V08R-0007gd-00; Sun, 27 Jan 2002 16:58:59 -0800
Message-ID: <3C54A24B.B0B607F@mindspring.com>
Date: Sun, 27 Jan 2002 16:58:51 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Gary W. Swearingen" <swear@blarg.net>
Cc: freebsd-chat@FreeBSD.ORG
Subject: Re: Bad disk partitioning policies (was: "Re: FreeBSD Intaller (was    
 "Re: ... RedHat ...")")
References: <20020123124025.A60889@HAL9000.wox.org>
		<3C4F5BEE.294FDCF5@mindspring.com> <20020123223104.SM01952@there>
		<p0510122eb875d9456cf4@[10.0.1.3]>
		<15440.35155.637495.417404@guru.mired.org>
		<p0510123fb876493753e0@[10.0.1.3]>
		<15440.53202.747536.126815@guru.mired.org>
		<p05101242b876db6cd5d7@[10.0.1.3]>
		<15441.17382.77737.291074@guru.mired.org>
		<p05101245b8771d04e19b@[10.0.1.3]>
		<20020125212742.C75216@over-yonder.net>
		<p05101203b8788a930767@[10.0.1.14]>
		<gc1ygc7sfi.ygc@localhost.localdomain>
		<3C534C4A.35673769@mindspring.com>
		<0s3d0s5dos.d0s@localhost.localdomain>
		<3C53ED01.61407A02@mindspring.com> <ffbsffnwfp.sff@localhost.localdomain>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-chat@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-chat.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-chat>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-chat>
X-Loop: FreeBSD.org

"Gary W. Swearingen" wrote:
> But you probably can't, without rewriting much of the FFS treatise.
> People are willing to trust experts when they say a certain behavior is
> result of the chosen algorithm if there's a hint that the expert has
> considered the issue (as you have more than hinted).  I thank you for
> trying to explain the reasons, but this just isn't the forum for it.  I
> don't wan't to seem ungrateful, but I think you should know that much of
> your explaination is the sort of thing that is often referred to by the
> very term you used in a physical context, "hand waving" (maybe with
> flakes of "snow job" thrown in).  It's better than nothing, but it's
> probably not worth the effort.  Please don't take offense; I'm trying,
> as I did in my last message less bluntly (and unsuccessfully), to
> convince you to not waste your time on incomplete explanations of
> hard-to-explain reasons, especially when the only question is how the
> system behaves, not why it does so.  (Thank you for having enough of the
> former in the last msg.)


Heh.

How about this: "It will hurt if you change things without
understanding them.  Read and understand the FFS paper".

?


> > Relative to the size of your disk, people complain about
> > very large disks for even a very small free reserve
> > percentage, mostly because they grew up in an era when
> > "that was a lot of space!".
> 
> It's not just that.  It's a hunch that defrag considerations should
> have as much to do with the size of files as it does with the amount
> of unused FS.  If the former stay the same, it seems reasonable that
> the free space/reserve/whatever should remain the same for similar
> defrag performance, regardless of FS size.  OK, the hunch is wrong.

If you don't exceed the optimal free reserve, the file
system doesn't fragment.  There is no such thing as
significant fragmentation in an optimally tuned FFS.

The fragmentation is avoided mathematically, not as a
result of having a reserved "work area" in which there
is active defragmentation occurring.

Read the paper.  ;^).


> > The reality is that the algorithm needs a certain percentage
> > of the space to work correctly, and if you take that away,
> > then it doesn't work correctly.
> 
> People reading about -m (or not even that) need a statement at
> least as blunt as that to prevent many from guessing that the
> talk of percentages isn't just another obsolete rule of thumb.

Patches?  8-).


> > Really, it'd probably be a good idea to find a reasonable
> > way to make swap take up disk space until you ran out, on
> 
> Interesting.
> 
> > This issue has been discussed many times before.  It's
> > in the literature, and it's in the FreeBSD list archives
> > dozens of times, at least.  8-).
> 
> And if it was discussed near the -m option or an SA-level article was
> referred to, we wouldn't be doing it again.

Patches?  8-).


> > To address your suggestions: this would imply that the you
> > could get non-worst-case performance on a full disk near a
> > very small free reserve selected administratively.
> 
> OK, so it will take a few lines to explain better.
> 
> > The real answer is that the more data on the disk above
> > the optimal free reserve for the algorithm used for block
> > selection, the worse the performance will be, and "worst
> > case" is defined as "the last write before hitting the
> > free reserve limit".  So disk performance degrades
> > steadily, the fuller it gets over the optimal free reserve
> > (which is ~15%, much higher than the free reserve kept on
> > most disks).
> 
> So it should say that performance degrades increasingly from negligible
> at 85% of the full FS to about 3 times slower near 100% full (plus
> increased permanent fragmentation of files).  And that this is a result
> of the algorithms used and is independent of FS size.  And this needs
> complication to mention the effects of the 5% switch and -o option.

Sure.  Let's see if other people agree with that; it's a bit
simplistic, in that you don't know whether the degradation
is linear or exponential (exponential), and even saying that
raises more questions from people who want to have knowledge
given to them, instead of having to learn it (such people
should have slots installed into their skulls before they
come bother us, wince without a means of "giving" it to them
like slotting a skills card into their brain, they are
wasting their time.  8-)).


> If I understand this correctly (a bad assumption), the peformance at
> 95% full is the same regardless of whether I reserve 10% or 1%.

Yes.

> Since I don't care if the "end" of the FS is slow, the only reason
> for picking a large -m I see is to avoid permanently fragmented
> files.  Wrong?

Yes, if we accept the assumption that you don't care if the
"end" of the FS is slow.

Realize that this is all irrelevent, and what we are really
talking about is not whether or not the disk space is able
to be used, but rather "eye candy" for the system owener so
that they can see a larger "available disk space" number.

I think the confusion comes because anyone who naievely looks
at the man pages, without reading them in depth, can come to
the conclusion that the "free reserve" might be there for root
use to recover a nearly completely full system, and so it's an
administrative, rather than an algorithmic requirement.

I believe that no matter *how well* you document things, you
will still have problems with tourists who don't take time
to read what you have written, in depth, to the point of
understanding it.

> Again, as it is, the documentation implies that performance with a
> small -m is always bad regardless of FS space remaining.

"The steady state of disks is full" -- Ken Thompson

If you fill the disk up, and then empty it back out, for
those files created during the "disk full" time, the
performance *is* always bad.

It's a matter of risk.  Like mounting your FS async.

The probability is pretty good that you will eventually
fill up your disk, because humans are, by nature, pack
rats.

> > BTWBTW: If you screw up an important file this way, you
> > can fix it by backing it up, deleting it, and restoring
> > it, once the disk has dropped down to the optimal free
> > reserve.  This is known as "the poor man's defragger".
> 
> mv file file.bak; cp -p file.bak file; rm file.bak   ## ?

No, actually.  The "cp" program doesn't leave sparse
files sparse.  You can use the (GNU) tar program option
for handling of sparse files (which it does by inference),
or you can use "backup" and "restore".

If no files are sparse, then the "cp" is OK.

Funny story: I filled up an AIX disk by moving the
documentation pages from one disk to another with "mv",
which degrades to "cp" if it's moving across partitions
on AIX.  The problem is that there are a *lot* of sparse
index files.  Since the original move was successful, it
took me a while to figure out where the space went.  8-).


> Thanks again.  I've saved your ID in my PR-to-do list and if I ever get
> the easier ones done and write one for -m, I'll CC it to you.

NP.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message