From owner-freebsd-questions Tue Dec 31 13:23:29 2002 Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E0FA937B401; Tue, 31 Dec 2002 13:23:25 -0800 (PST) Received: from tomts11-srv.bellnexxia.net (tomts11.bellnexxia.net [209.226.175.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1112C43ED4; Tue, 31 Dec 2002 13:23:25 -0800 (PST) (envelope-from matt@gsicomp.on.ca) Received: from xena.gsicomp.on.ca ([65.95.176.107]) by tomts11-srv.bellnexxia.net (InterMail vM.5.01.04.19 201-253-122-122-119-20020516) with ESMTP id <20021231212324.EZRF8221.tomts11-srv.bellnexxia.net@xena.gsicomp.on.ca>; Tue, 31 Dec 2002 16:23:24 -0500 Received: from hermes (hermes.gsicomp.on.ca [192.168.0.18]) by xena.gsicomp.on.ca (8.11.3/8.11.3) with SMTP id gBVLLup41930; Tue, 31 Dec 2002 16:21:57 -0500 (EST) (envelope-from matt@gsicomp.on.ca) Message-ID: <025701c2b112$ddfbf580$1200a8c0@gsicomp.on.ca> From: "Matthew Emmerton" To: "Bruce Campbell" , , Cc: References: <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca> Subject: Re: ata "fallback to PIO mode" on dual processor AMD systems Date: Tue, 31 Dec 2002 16:23:30 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [ cc'ing Soren since he's the ATA guru ] > I am seeing a problem with ata disks on 4 new systems, which > I believe is either a bug in the ata driver, or a problem with > the onboard IDE controller, or something else. Systems are as follows: > > Motherboard: ASUS A7M266-D > CPUs : 2 x 2000+ AMD MP > Memory : 2 x 512MB Crucial part: CT6472Y265 > > Disks (all UDMA100): > > Master Slave > System 1: WDC WD400BB WDC WD1000BB > System 2: WDC WD400BB WDC WD1000BB > System 3: WDC WD400BB WDC WD800BB > System 4: WDC WD400BB Maxtor 98196H8 > > Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC): > > commented out: > > cpu I386_CPU > cpu I486_CPU > > enabled > > options SMP # Symmetric MultiProcessor Kernel > options APIC_IO # Symmetric (APIC) I/O > > > I am running a test with "dbench" (/usr/ports/benchmarks/dbench) > with a script which runs: > > dbench 1 > sleep for 5 minutes > dbench 2 > sleep for 5 minutes > dbench 3 > ... > > to simulate 1,2,3... clients. > > The following has happened on systems 2,3 and 4, after about 15 hours > of running the test: > > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 - > resetting > Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00 > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > > The test continues to run with the ata controller in PIO mode, with > slower performance, and higher load average. > > Once the master drops to PIO, attempts to access the slave then cause > it to drop to PIO. > > If I run: > > atacontrol mode 0 UDMA100 UDMA100 > > attempts to access either drive result in a delay until the controller > drops to PIO, and then operations resume. A soft reboot and things > work in UDMA mode again. Also tried UDMA33 and UDMA66 with no change. > I also tried "atacontrol reinit 0" with no help. > > Theories when I search the web for "fallback to PIO mode" include: > > - bad disks > - something to do with thermal recalibration > > I don't believe the problems are bad disks, as the slave drops to PIO > after the master does, and I can't get in back to UDMA, other than by > soft reboot. Plus I see the problem on 6 of 8 disks. > > The problem is very repeatable. > > Can anyone offer any ideas, or suggest investigative steps ? I have a system > in PIO mode right now. The reason the slave drops to PIO after the master does is by design - the master and slave have to use the same signalling mode since they're on the same cable. (People often report lackluster performance of fast UDMA hard drives with non-UDMA CD-ROMs on the same channel.) Are you using 80-conductor cables on all your drives? These are required to get consistent high throughput, and running without them may cause the problems you're seeing. -- Matt Emmerton To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message