From owner-freebsd-current@FreeBSD.ORG Thu Dec 17 13:53:07 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20BD7106568D for ; Thu, 17 Dec 2009 13:53:07 +0000 (UTC) (envelope-from prvs=1602d8d4d9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 79B608FC2D for ; Thu, 17 Dec 2009 13:53:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=multiplay.co.uk; s=Multiplay; t=1261057348; x=1261662148; q=dns/txt; h=Received: Message-ID:From:To:References:Subject:Date:MIME-Version: Content-Type:Content-Transfer-Encoding; bh=C2bnwXmD9bwRDexGQDs4O 2nevh49Gy9Fik5EexMZBrk=; b=INPVavPw9owcVNn1Q8bgGQAp0xtqybL2lr9cz LsragszHrIsr/Bc3ADupD0UG4YFzSP+fg8DKvQfk4ssUzCJXCLY/ziZ6n6BcrRDJ E9efeRFBKjFse4ISPrYMruosm3ULeKA7mvWT6dxUcZ0JUTPITQ9E7PLgjPcz7wzN M0RV+Q= X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 17 Dec 2009 13:42:28 +0000 Received: from r2d2 by mail1.multiplay.co.uk (MDaemon PRO v10.0.4) with ESMTP id md50008878490.msg for ; Thu, 17 Dec 2009 13:42:26 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 17 Dec 2009 13:42:26 +0000 (not processed: message from trusted or authenticated source) X-Authenticated-Sender: Killing@multiplay.co.uk X-MDRemoteIP: 213.123.247.160 X-Return-Path: prvs=1602d8d4d9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-current@freebsd.org Message-ID: <4497C65B2FB14CE781BABF07F396841D@multiplay.co.uk> From: "Steven Hartland" To: "Alexander Zagrebin" , References: <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> Date: Thu, 17 Dec 2009 13:42:24 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="koi8-r"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5843 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 Cc: Subject: Re: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA relatedproblem) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2009 13:53:07 -0000 We see a very similar pattern on straight ufs disks on an areca controller. Machine console becomes totally unresponsive 30 or so seconds at a time, so it maybe something more intergral to device operation, buffercache or something? The IO load on the machine is being generated by updating rrds files in mrtg for a few hundred switch ports and a few thousand servers. device r/s w/s kr/s kw/s wait svc_t b da0 0.0 0.0 0.0 0.0 1584 0.0 0 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 342.6 0.0 5577.1 1241 7325.5 241 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 0.0 0.0 0.0 1241 0.0 0 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 0.0 0.0 0.0 1241 0.0 0 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 930.8 0.0 15165.2 309 10483.0 322 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 267.7 0.0 4330.7 41 10952.8 41 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 232.7 0.0 3739.4 0 2207.4 158 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 0.0 0.0 0.0 0 0.0 0 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t b da0 0.0 0.0 0.0 0.0 0 0.0 0 pass0 0.0 0.0 0.0 0.0 0 0.0 0 pass1 0.0 0.0 0.0 0.0 0 0.0 0 ----- Original Message ----- From: "Alexander Zagrebin" To: Sent: Thursday, December 17, 2009 9:00 AM Subject: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA relatedproblem) > Hi! > > I use onboard ICH7 SATA controller with two disks attached: > > atapci1: port > 0x30c8-0x30cf,0x30ec-0x30ef,0x30c0-0x30c7,0x30e8-0x30eb,0x30a0-0x30af irq 19 > at device 31.2 on pci0 > atapci1: [ITHREAD] > ata2: on atapci1 > ata2: [ITHREAD] > ata3: on atapci1 > ata3: [ITHREAD] > ad4: 1430799MB at ata2-master SATA150 > ad6: 1430799MB at ata3-master SATA150 > > The disks are used for mirrored ZFS pool. > I have noticed that the system periodically locks up on disk operations. > After approx. 10 min of very slow disk i/o (several KB/s) the speed of disk > operations restores to normal. > gstat has shown that the problem is in ad6. > For example, there is a filtered output of iostat -x 1: > > extended device statistics > device r/s w/s kr/s kw/s wait svc_t %b > ad6 818.6 0.0 10840.2 0.0 0 0.4 34 > ad6 300.6 642.0 3518.5 24830.3 50 24.8 72 > ad6 1.0 639.3 63.7 17118.3 0 62.1 98 > ad6 404.5 4.0 6837.7 4.0 0 0.5 18 > ad6 504.5 0.0 13667.2 0.0 1 0.7 32 > ad6 633.3 0.0 13190.3 0.0 1 0.7 38 > ad6 416.3 384.5 8134.7 24606.2 0 16.3 57 > ad6 538.9 76.7 9772.8 2982.2 55 2.9 40 > ad6 31.9 929.5 801.0 37498.6 0 27.2 82 > ad6 635.5 0.0 13087.1 0.0 1 0.6 35 > ad6 579.6 0.0 16669.8 0.0 0 0.8 43 > ad6 603.6 0.0 11697.4 0.0 1 0.7 40 > ad6 538.0 0.0 10438.7 0.0 0 0.9 47 > ad6 30.9 898.4 868.6 40585.4 0 36.6 78 > ad6 653.3 86.6 8566.6 202.7 1 0.8 40 > ad6 737.1 0.0 6429.4 0.0 1 0.6 42 > ad6 717.1 0.0 3958.7 0.0 0 0.5 36 > ad6 1179.5 0.0 2058.9 0.0 0 0.1 15 > ad6 1191.2 0.0 1079.6 0.0 1 0.1 15 > ad6 985.1 0.0 5093.9 0.0 0 0.2 23 > ad6 761.8 0.0 9801.3 0.0 1 0.4 31 > ad6 698.7 0.0 9215.1 0.0 0 0.4 30 > ad6 434.2 513.9 5903.1 13658.3 48 10.2 55 > ad6 3.0 762.8 191.2 28732.3 0 57.6 99 > ad6 10.0 4.0 163.9 4.0 1 1.6 2 > > Before this line we have a normal operations. > Then the behaviour of ad6 changes (pay attention to high average access time > and percent of "busy" significantly greater than 100): > > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 0.5 0.0 1 1798.3 179 > ad6 1.0 0.0 1.5 0.0 1 1775.4 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 10.0 0.0 75.2 0.0 1 180.3 180 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 83.7 0.0 862.9 0.0 1 21.4 179 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 63.7 0.0 1 1707.4 170 > ad6 1.0 0.0 9.0 0.0 0 1791.0 178 > ad6 10.9 0.0 172.2 0.0 2 0.2 0 > ad6 24.9 0.0 553.7 0.0 1 143.3 179 > ad6 0.0 0.0 0.0 0.0 7 0.0 0 > ad6 2.0 23.9 32.4 1529.9 1 336.3 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 68.7 0.0 1322.8 0.0 1 26.3 181 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 27.9 0.0 193.7 0.0 1 61.6 172 > ad6 1.0 0.0 2.5 0.0 1 1777.4 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 2.0 0.0 1 1786.9 178 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 2.0 0.0 6.5 0.0 1 899.4 179 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 2.0 0.0 1 1786.7 178 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > > And so on for about 10 minutes. > Then the disk i/o is reverted to normal: > > ad6 139.4 0.0 8860.5 0.0 1 4.4 61 > ad6 167.3 0.0 10528.7 0.0 1 3.3 55 > ad6 60.8 411.5 3707.6 8574.8 1 19.6 87 > ad6 163.4 0.0 10334.9 0.0 1 4.4 72 > ad6 157.4 0.0 9770.7 0.0 1 5.0 78 > ad6 108.5 0.0 6886.8 0.0 0 3.9 43 > ad6 101.6 0.0 6381.4 0.0 0 2.6 27 > ad6 109.6 0.0 7013.9 0.0 0 2.0 22 > ad6 121.4 0.0 7769.7 0.0 0 2.4 29 > ad6 92.5 0.0 5922.6 0.0 1 3.4 31 > ad6 122.4 19.9 7833.0 1273.7 0 3.9 54 > ad6 83.6 0.0 5349.5 0.0 0 3.9 33 > ad6 5.0 0.0 318.4 0.0 0 8.1 4 > > There are no ata error messages neither in the system log, nor on the > console. > The manufacture's diagnostic test is passed on ad6 without any errors. > The ad6 also contains swap partition. > I have tried to run several (10..20) instances of dd, which read and write > data > from and to the swap partition simultaneously, but it has not called the > lockup. > So there is a probability that this problem is ZFS related. > > I have been forced to switch ad6 to the offline state... :( > > Any suggestions on this problem? > > -- > Alexander Zagrebin > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.