From owner-freebsd-current@FreeBSD.ORG Thu Jul 28 06:54:50 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E884116A41F for ; Thu, 28 Jul 2005 06:54:49 +0000 (GMT) (envelope-from julian@elischer.org) Received: from postoffice.vicor-nb.com (postoffice.vicor.com [69.26.56.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0515843D4C for ; Thu, 28 Jul 2005 06:54:47 +0000 (GMT) (envelope-from julian@elischer.org) Received: from localhost (localhost [127.0.0.1]) by postoffice.vicor-nb.com (Postfix) with ESMTP id 9F6854CE994 for ; Wed, 27 Jul 2005 23:54:47 -0700 (PDT) Received: from postoffice.vicor-nb.com ([127.0.0.1]) by localhost (postoffice.vicor-nb.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 14017-01 for ; Wed, 27 Jul 2005 23:54:46 -0700 (PDT) Received: from [208.206.78.97] (julian.vicor-nb.com [208.206.78.97]) by postoffice.vicor-nb.com (Postfix) with ESMTP id 5A5074CE990 for ; Wed, 27 Jul 2005 23:54:46 -0700 (PDT) Message-ID: <42E88135.30603@elischer.org> Date: Wed, 27 Jul 2005 23:54:45 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050629 X-Accept-Language: en, hu MIME-Version: 1.0 To: FreeBSD Current Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at postoffice.vicor.com Subject: Apparent strange disk behaviour in 6.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2005 06:54:50 -0000 I've been playing around with some raid arrays. I've notived some odd things. firstly on a 2+HTT (i.e 4virtual) CPU system with one SCSI array and an ATA drive, copying data from the ATA drive to the SCSI array seems to be a slower than it was on 4.x. secondly systat -vmstat never shows either of the drives as being 100% busy. teh most I've seen is the ATA drive being 70% busy. for example this is theoretically a disk IO bound system but: Disks ad0 da0 pass0 pass1 pass2 KB/t 19.40 11.68 0.00 0.00 0.00 tps 440 539 0 0 0 MB/s 8.34 6.14 0.00 0.00 0.00 % busy 48 50 0 0 0 I don't know how reliable that is however. I HAVE noticed however that the sum of the busy percents for the two drives seems to always be less that 110%. If one goes up then the other goes down. Not knowing how these numbers are calculated, it's hard to know whether that means anything. Physically looking at the array, the disks spend a LOT of time doing nothing. The array controller is obviously clustering the writes and seems to be writing them out every 2 seconds but the disks are only busy for about 1/4 of that time. I don't know how reliable that is however as an indication but whatever the bottle neck is it's not the drives. The array controller is reporting back that it hardly ever has a queue of more than 1 thing to do, even though tags are set to 253 (occasionally th controller will report it has 20 to do but the next instant it's caught up again) I plan on net booting the same machine on 4.11 again and doing the same tests. If I REALLY get the disks 100% busy by doing: dd if=/dev/zero of=/raid1/bigfile bs=128k count=1000000, then the system becomes so unresponsive that it takes about 10 seconds for a ^C to get through to stop the dd. a systat -vmstat running at the same time on another window slows down and then just updates every now and then. At no stage however does it show anything getting close to 100% of cpu time. interrupt time is at about 15% and system time at anout 20%. The odd thing is that a tip talking to the raid controller continues to sho resposive behavior, continuing to update the raid stats page. and the network seems to be bringing those to me just fine so teh com ports and the network are at least able to function, even if everything else seizes up. iostat sometimes continues to run and this is what it showed during one section where the rest of teh system seemd pretty unresponsive: tty ad0 da0 pass0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 53 79 6.13 13 0.08 16.68 1746 28.44 0.00 0 0.00 0 0 22 5 73 604 836 6.00 2 0.01 16.00 1749 27.32 0.00 0 0.00 0 0 28 5 67 168 240 7.97 31 0.24 128.00 40 4.96 0.00 0 0.00 0 0 27 9 64 173 251 11.27 11 0.12 16.00 3047 47.61 0.00 0 0.00 0 0 30 5 65 222 299 12.93 46 0.58 21.72 2092 44.37 0.00 0 0.00 0 0 34 5 60 225 302 13.29 34 0.44 128.00 39 4.87 0.00 0 0.00 0 0 40 16 43 172 250 6.82 34 0.23 30.45 217 6.44 0.00 0 0.00 0 0 52 15 33 191 268 6.22 9 0.05 16.72 1559 25.44 0.00 0 0.00 0 0 18 3 80 200 278 10.45 31 0.32 18.78 1007 18.46 0.00 0 0.00 0 0 54 11 34 192 270 12.00 1 0.01 16.00 2827 44.18 0.00 0 0.00 0 0 34 6 59 213 728 8.80 40 0.34 18.68 1225 22.34 0.00 0 0.00 0 0 42 11 47 201 250 10.29 11 0.11 128.00 3 0.41 0.00 0 0.00 0 0 22 5 74 186 281 8.20 37 0.30 125.85 49 5.98 0.00 0 0.00 0 0 33 11 56 225 302 4.00 3 0.01 16.00 2977 46.52 0.00 0 0.00 0 0 29 4 66 I'm guessing that there may be a red-hot mutex somewhere in the kernel.. not sure what though..