From owner-freebsd-current@FreeBSD.ORG  Sat Jul 30 22:10:51 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 597E216A41F
	for <freebsd-current@freebsd.org>; Sat, 30 Jul 2005 22:10:51 +0000 (GMT)
	(envelope-from b.candler@pobox.com)
Received: from orb.pobox.com (orb.pobox.com [207.8.226.5])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D821B43D45
	for <freebsd-current@freebsd.org>; Sat, 30 Jul 2005 22:10:50 +0000 (GMT)
	(envelope-from b.candler@pobox.com)
Received: from orb (localhost [127.0.0.1])
	by orb.pobox.com (Postfix) with ESMTP
	id 1291E1FB3; Sat, 30 Jul 2005 18:10:50 -0400 (EDT)
Received: from billdog.local.linnet.org
	(dsl-212-74-113-66.access.uk.tiscali.com [212.74.113.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by orb.sasl.smtp.pobox.com (Postfix) with ESMTP id A7A4290;
	Sat, 30 Jul 2005 18:10:47 -0400 (EDT)
Received: from brian by billdog.local.linnet.org with local (Exim 4.50
	(FreeBSD)) id 1DyzZ2-0000Dg-2A; Sat, 30 Jul 2005 23:12:16 +0100
Date: Sat, 30 Jul 2005 23:12:16 +0100
From: Brian Candler <B.Candler@pobox.com>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20050730221215.GA757@uk.tiscali.com>
References: <20050730171536.GA740@uk.tiscali.com>
	<4559.1122748637@phk.freebsd.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4559.1122748637@phk.freebsd.dk>
User-Agent: Mutt/1.4.2.1i
Cc: FreeBSD Current <freebsd-current@freebsd.org>,
	Julian Elischer <julian@elischer.org>
Subject: Re: Apparent strange disk behaviour in 6.0
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Jul 2005 22:10:51 -0000

On Sat, Jul 30, 2005 at 08:37:17PM +0200, Poul-Henning Kamp wrote:
> In message <20050730171536.GA740@uk.tiscali.com>, Brian Candler writes:
> >On Sat, Jul 30, 2005 at 03:29:27AM -0700, Julian Elischer wrote:
> >> 
> >> The snapshot below is typical when doing tar from one drive to another..
> >> (tar c -C /disk1 f- .|tar x -C /disk2 -f - )
> >> 
> >> dT: 1.052  flag_I 1000000us  sizeof 240  i -1
> >>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d  %busy Name
> >>     0    405    405   1057    0.2      0      0    0.0      0      0    0.0  9.8| ad0
> >>     0    405    405   1057    0.3      0      0    0.0      0      0    0.0 11.0| ad0s2
> >>     0    866      3     46    0.4    863   8459    0.7      0      0    0.0 63.8| da0
> >>    25    866      3     46    0.5    863   8459    0.8      0      0    0.0 66.1| da0s1
> >>     0    405    405   1057    0.3      0      0    0.0      0      0    0.0 12.1| ad0s2f
> >>   195    866      3     46    0.5    863   8459    0.8      0      0    0.0 68.1| da0s1d
...
> >But if really is only 12.1% busy (which the 0.3 ms/r implies),
> 
> "busy %" numbers is *NOT* a valid measure of disk throughput, please do
> not pay attention to such numbers!

It seems to me that
     reads/sec * milliseconds/read  =  milliseconds spent reading per second

and that the "busy %" is <milliseconds per second spent with one or more
read or write requests outstanding> expressed as a percentage. The figures
in the above table seem to bear this out, bar rounding errors since ms/r is
so small. Or am I mistaken?

Examples:

> >>     0    405    405   1057    0.2      0      0    0.0      0      0    0.0  9.8| ad0

    405 * 0.2 = 81ms reading = 8%  (vs. busy% = 9.8%)
   
> >>    25    866      3     46    0.5    863   8459    0.8      0      0    0.0 66.1| da0s1

    3*0.5 + 863 * 0.8 = 692ms read/write = 69% (vs. busy% = 66%)

I guess I could dig through the source to check if this is true. But this is
how I had always assumed "busy %" was calculated: time spent waiting for
reads or writes to complete, as opposed to the time spent idle (with no
outstanding read or write request queued)

If I'm right, then the OP is right to ask why both the reading and writing
disks are well under 100% utilisation for a simple streaming copy-from or
copy-to operation.

> If you want to know how busy your disk is, simply look in the ms/r
> and ms/r columns and decide if you can live with that average
> transaction time.  If it is too high for your liking, then your
> disk is too busy.
> 
> If you want to do quantitive predictions, you need to do the
> queue-theory thing on those numbers.
> 
> If you know your queue-theory, you also know why busy% is
> a pointless measurement:  It represents the amount of time
> where the queue is non-empty.  It doesn't say anything about
> how quickly the queue drains or fills.

Indeed; if you have multiple processes competing to access the disk at
random points in time, then the time to service each request is going to be
calculated using queueing theory. For the same reason, an Internet
connection is considered "full" at ~70% utilisation, because the latency
goes through the roof above that, and users get unhappy.

But here we're talking about a single process trying to spool stuff off (or
onto) the disk as quickly as possible. Surely if everything is working
properly, it ought to be able to keep the queue of read (or write) requests
permanently non-empty, and therefore the disk should be permanently in use?
That's like an IP pipe being used for a single FTP stream with sufficiently
large window size. That *should* reach 100% utilisation.

I'm not saying geom is counting wrongly; I am just agreeing with the OP that
the underlying reason for this poor utilisation is worth investigating.
After all, he also only got 1M/s read and 8M/s write. It seems unlikely that
the CPU is unable to shift that amount of data per second. But if there were
poor performance from the drive or the I/O card, that still ought to show as
100% utilisation.

Regards,

Brian.