From owner-freebsd-current@FreeBSD.ORG Fri Dec 18 08:57:49 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 696A91065670 for ; Fri, 18 Dec 2009 08:57:49 +0000 (UTC) (envelope-from pmurray@nevada.net.nz) Received: from bellagio.open2view.net (bellagio.open2view.net [210.48.79.75]) by mx1.freebsd.org (Postfix) with ESMTP id 16B198FC1B for ; Fri, 18 Dec 2009 08:57:48 +0000 (UTC) Received: from [10.1.1.5] (ip-118-90-27-24.xdsl.xnet.co.nz [118.90.27.24]) (Authenticated sender: pmurray@nevada.net.nz) by bellagio.open2view.net (Postfix) with ESMTP id 3E2C6300949F; Fri, 18 Dec 2009 21:38:59 +1300 (NZDT) References: <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> <4B2A341C.5000802@clearchain.com> <6D3B0162A2134CAEA9F4DF5BC03707AA@vosz.local> Message-Id: <4C1C2598-4157-4B04-8DB8-C84F353AB8B8@nevada.net.nz> From: Phil Murray To: Alexander Zagrebin In-Reply-To: <6D3B0162A2134CAEA9F4DF5BC03707AA@vosz.local> Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Mailer: iPhone Mail (7D11) Mime-Version: 1.0 (iPhone Mail 7D11) Date: Fri, 18 Dec 2009 21:39:40 +1300 Cc: "freebsd-current@freebsd.org" Subject: Re: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA related problem) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2009 08:57:49 -0000 On 18/12/2009, at 9:15 PM, "Alexander Zagrebin" wrote: > Big thanks for your reply! > >>> I use onboard ICH7 SATA controller with two disks attached: >>> >>> atapci1: port >>> >> 0x30c8-0x30cf,0x30ec-0x30ef,0x30c0-0x30c7,0x30e8-0x30eb,0x30a0 >> -0x30af irq 19 >>> at device 31.2 on pci0 >>> atapci1: [ITHREAD] >>> ata2: on atapci1 >>> ata2: [ITHREAD] >>> ata3: on atapci1 >>> ata3: [ITHREAD] >>> ad4: 1430799MB at ata2-master SATA150 >>> ad6: 1430799MB at ata3-master SATA150 >>> >>> The disks are used for mirrored ZFS pool. >>> I have noticed that the system periodically locks up on >> disk operations. >>> After approx. 10 min of very slow disk i/o (several KB/s) >> the speed of disk >>> operations restores to normal. >>> gstat has shown that the problem is in ad6. >>> For example, there is a filtered output of iostat -x 1: >>> >>> extended device statistics >>> device r/s w/s kr/s kw/s wait svc_t %b >>> ad6 985.1 0.0 5093.9 0.0 0 0.2 23 >>> ad6 761.8 0.0 9801.3 0.0 1 0.4 31 >>> ad6 698.7 0.0 9215.1 0.0 0 0.4 30 >>> ad6 434.2 513.9 5903.1 13658.3 48 10.2 55 >>> ad6 3.0 762.8 191.2 28732.3 0 57.6 99 >>> ad6 10.0 4.0 163.9 4.0 1 1.6 2 >>> >>> Before this line we have a normal operations. >>> Then the behaviour of ad6 changes (pay attention to high >> average access time >>> and percent of "busy" significantly greater than 100): >>> >>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>> ad6 1.0 0.0 0.5 0.0 1 1798.3 179 >>> ad6 1.0 0.0 1.5 0.0 1 1775.4 177 >>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>> ad6 10.0 0.0 75.2 0.0 1 180.3 180 >>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>> ad6 1.0 0.0 2.0 0.0 1 1786.7 178 >>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>> >>> And so on for about 10 minutes. >>> Then the disk i/o is reverted to normal: >>> >>> ad6 139.4 0.0 8860.5 0.0 1 4.4 61 >>> ad6 167.3 0.0 10528.7 0.0 1 3.3 55 >>> ad6 60.8 411.5 3707.6 8574.8 1 19.6 87 >>> ad6 163.4 0.0 10334.9 0.0 1 4.4 72 >>> ad6 157.4 0.0 9770.7 0.0 1 5.0 78 >>> ad6 108.5 0.0 6886.8 0.0 0 3.9 43 >>> >>> There are no ata error messages neither in the system log, >> nor on the >>> console. >>> The manufacture's diagnostic test is passed on ad6 without >> any errors. >>> The ad6 also contains swap partition. >>> I have tried to run several (10..20) instances of dd, which >> read and write >>> data >>> from and to the swap partition simultaneously, but it has >> not called the >>> lockup. >>> So there is a probability that this problem is ZFS related. >>> >>> I have been forced to switch ad6 to the offline state... :( >>> >>> Any suggestions on this problem? >>> >> I also have been experiencing the same problem with a different >> disk/controller (via mpt on a vmware machine). During the >> same period I >> notice that system cpu usage hits 80+% and top shows the >> zfskern process >> being the main culprit. At the same time I've discovered the >> kstat.zfs.misc.arcstats.memory_throttle_count sysctl rising. >> Arc is also >> normally close to the arc_max limit. > > My case has differences. > 1. CPU usage is near 0% > 2. zfs's sysctls doesn't change significantly during > "normal operation" -> "lockup" -> "normal" transition > 3. ARC size is far from its limits, > kstat.zfs.misc.arcstats.memory_throttle_count: 0 > > Here my actions, observations and conclusions: > 1. I have tried to change placements of disks on sata channels. > Nothing has changed - the problems still on WD15EADS, although it > became > ad4. > So issue isn't in south bridge, sata cables and so on. > 2. I have tried to detach ad6 from the pool, to zero system area, > and to > reattach it again. > Of course, resilvering was started. During resilvering 250 GB was > copied > without lockups > and delays. While resilvering, I have tried periodically to load > drive > with a read > operations (dd if=/dev/ad6 of=/dev/null ...). > But after resilvering and several minutes of normal mirror > operation, > lockups appeared again. > So drive is seems to be ok and we have a software problem? > 3. I have noticed that lockups often happens during postgresql > activity. > postgresql often uses sync. So I have tried to disable ZIL. > No success. > 4. "IDE LED" is constantly on during lockups. > So it is really read/write delays. > 5. I see two variants of zfskern's state: > a) it is constantly in the vgeom:io > b) it is in either zio->io_ state (when active), or in tx->tx_s > (when > idle). > During lockups it is mostly in zio->io_. > What the difference with vgeom:io and zio->io_/tx->tx_s? > > May be a problem is in ata? WD15EADS is a "green" series of drives. The WD green drives have a feature called Time Limited Error Recovery where the disk can spend several minutes trying to read a bad block etc. It plays havoc with RAID arrays which is why WD recommend you don't use the green drives in arrays. They have more info about the "feature" in the WD FAQ/knowledgebase > May be i have a problem with its power management? > Is there a method to completely reset sata channel and drive? > atacontrol reinit will do it? > > Any help is welcomed. > > -- > Alexander Zagrebin > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org > "