From owner-freebsd-current@FreeBSD.ORG Sat Jul 30 22:10:51 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 597E216A41F for ; Sat, 30 Jul 2005 22:10:51 +0000 (GMT) (envelope-from b.candler@pobox.com) Received: from orb.pobox.com (orb.pobox.com [207.8.226.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id D821B43D45 for ; Sat, 30 Jul 2005 22:10:50 +0000 (GMT) (envelope-from b.candler@pobox.com) Received: from orb (localhost [127.0.0.1]) by orb.pobox.com (Postfix) with ESMTP id 1291E1FB3; Sat, 30 Jul 2005 18:10:50 -0400 (EDT) Received: from billdog.local.linnet.org (dsl-212-74-113-66.access.uk.tiscali.com [212.74.113.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by orb.sasl.smtp.pobox.com (Postfix) with ESMTP id A7A4290; Sat, 30 Jul 2005 18:10:47 -0400 (EDT) Received: from brian by billdog.local.linnet.org with local (Exim 4.50 (FreeBSD)) id 1DyzZ2-0000Dg-2A; Sat, 30 Jul 2005 23:12:16 +0100 Date: Sat, 30 Jul 2005 23:12:16 +0100 From: Brian Candler To: Poul-Henning Kamp Message-ID: <20050730221215.GA757@uk.tiscali.com> References: <20050730171536.GA740@uk.tiscali.com> <4559.1122748637@phk.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4559.1122748637@phk.freebsd.dk> User-Agent: Mutt/1.4.2.1i Cc: FreeBSD Current , Julian Elischer Subject: Re: Apparent strange disk behaviour in 6.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jul 2005 22:10:51 -0000 On Sat, Jul 30, 2005 at 08:37:17PM +0200, Poul-Henning Kamp wrote: > In message <20050730171536.GA740@uk.tiscali.com>, Brian Candler writes: > >On Sat, Jul 30, 2005 at 03:29:27AM -0700, Julian Elischer wrote: > >> > >> The snapshot below is typical when doing tar from one drive to another.. > >> (tar c -C /disk1 f- .|tar x -C /disk2 -f - ) > >> > >> dT: 1.052 flag_I 1000000us sizeof 240 i -1 > >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name > >> 0 405 405 1057 0.2 0 0 0.0 0 0 0.0 9.8| ad0 > >> 0 405 405 1057 0.3 0 0 0.0 0 0 0.0 11.0| ad0s2 > >> 0 866 3 46 0.4 863 8459 0.7 0 0 0.0 63.8| da0 > >> 25 866 3 46 0.5 863 8459 0.8 0 0 0.0 66.1| da0s1 > >> 0 405 405 1057 0.3 0 0 0.0 0 0 0.0 12.1| ad0s2f > >> 195 866 3 46 0.5 863 8459 0.8 0 0 0.0 68.1| da0s1d ... > >But if really is only 12.1% busy (which the 0.3 ms/r implies), > > "busy %" numbers is *NOT* a valid measure of disk throughput, please do > not pay attention to such numbers! It seems to me that reads/sec * milliseconds/read = milliseconds spent reading per second and that the "busy %" is expressed as a percentage. The figures in the above table seem to bear this out, bar rounding errors since ms/r is so small. Or am I mistaken? Examples: > >> 0 405 405 1057 0.2 0 0 0.0 0 0 0.0 9.8| ad0 405 * 0.2 = 81ms reading = 8% (vs. busy% = 9.8%) > >> 25 866 3 46 0.5 863 8459 0.8 0 0 0.0 66.1| da0s1 3*0.5 + 863 * 0.8 = 692ms read/write = 69% (vs. busy% = 66%) I guess I could dig through the source to check if this is true. But this is how I had always assumed "busy %" was calculated: time spent waiting for reads or writes to complete, as opposed to the time spent idle (with no outstanding read or write request queued) If I'm right, then the OP is right to ask why both the reading and writing disks are well under 100% utilisation for a simple streaming copy-from or copy-to operation. > If you want to know how busy your disk is, simply look in the ms/r > and ms/r columns and decide if you can live with that average > transaction time. If it is too high for your liking, then your > disk is too busy. > > If you want to do quantitive predictions, you need to do the > queue-theory thing on those numbers. > > If you know your queue-theory, you also know why busy% is > a pointless measurement: It represents the amount of time > where the queue is non-empty. It doesn't say anything about > how quickly the queue drains or fills. Indeed; if you have multiple processes competing to access the disk at random points in time, then the time to service each request is going to be calculated using queueing theory. For the same reason, an Internet connection is considered "full" at ~70% utilisation, because the latency goes through the roof above that, and users get unhappy. But here we're talking about a single process trying to spool stuff off (or onto) the disk as quickly as possible. Surely if everything is working properly, it ought to be able to keep the queue of read (or write) requests permanently non-empty, and therefore the disk should be permanently in use? That's like an IP pipe being used for a single FTP stream with sufficiently large window size. That *should* reach 100% utilisation. I'm not saying geom is counting wrongly; I am just agreeing with the OP that the underlying reason for this poor utilisation is worth investigating. After all, he also only got 1M/s read and 8M/s write. It seems unlikely that the CPU is unable to shift that amount of data per second. But if there were poor performance from the drive or the I/O card, that still ought to show as 100% utilisation. Regards, Brian.