From owner-freebsd-current@FreeBSD.ORG Thu Dec 25 07:12:38 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1106E16A4CE; Thu, 25 Dec 2003 07:12:38 -0800 (PST) Received: from freebee.digiware.nl (dsl144.iae.nl [212.61.62.145]) by mx1.FreeBSD.org (Postfix) with ESMTP id B5C0F43D2F; Thu, 25 Dec 2003 07:12:34 -0800 (PST) (envelope-from wjw@withagen.nl) Received: from vaiowjw (pc76.digiware.nl [212.61.27.76]) by freebee.digiware.nl (8.12.10/8.12.9) with SMTP id hBPFDMAG036868; Thu, 25 Dec 2003 16:13:32 +0100 (CET) (envelope-from wjw@withagen.nl) Message-ID: <002901c3caf9$855abe50$4c1b3dd4@digiware.nl> From: "Willem Jan Withagen" To: "Soren Schmidt" , "Bruce Evans" References: <200312250907.hBP97AUG031336@spider.deepcore.dk> Date: Thu, 25 Dec 2003 16:08:31 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-Mimeole: Produced By Microsoft MimeOLE V6.00.2800.1165 cc: current@freebsd.org Subject: Re: deadlock in ata_queue_request() X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Dec 2003 15:12:38 -0000 Not shure if this is the same problem, but I get many of these. And on regular basis it freezes my system beyond getting into DDB, or crtl-alt-del, only the hard reset gets me out. Console has than: ad4: READ command timeout tag=0 serv=0 - resetting ata2: resetting devices .. Normally this is followed with: done ... But once in a while the box does not return... So is this a disk going bad, or is it a bug?? This is with GENERIC-5.1-p11 The disk is: ATA channel 2: Master: ad4 ATA/ATAPI rev 7 Running on a Promise PDC20269 UDMA133 controller at DMA133. But the box crashed again (20 minutes up), so I'm getting a little worried. It's part of a 4 disk vinum raid5, so I've got time to go --WjW ----- Original Message ----- From: "Soren Schmidt" To: "Bruce Evans" Cc: ; Sent: Thursday, December 25, 2003 10:07 AM Subject: Re: deadlock in ata_queue_request() > It seems Bruce Evans wrote: > > ata_queue_request() sleeps in an interrupt handler here: > > Yes, I have a local fix to help this, the sleep was originally left in to > make a backport to -stable easier (ie no mutexes), but this need to be > changed here. I'll get it committed asap, but it is hollidays and the > kids has alot of new toys :) > > > % void > > % ata_queue_request(struct ata_request *request) > > % { > > % /* mark request as virgin (it might be a reused one) */ > > % request->result = request->status = request->error = 0; > > % request->flags &= ~ATA_R_DONE; > > % > > % /* put request on the locked queue at the specified location */ > > % mtx_lock(&request->device->channel->queue_mtx); > > % if (request->flags & ATA_R_AT_HEAD) > > % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain); > > % else > > % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain); > > % mtx_unlock(&request->device->channel->queue_mtx); > > % > > % /* should we skip start ? */ > > % if (!(request->flags & ATA_R_SKIPSTART)) > > % ata_start(request->device->channel); > > % > > % /* if this was a requeue op callback/sleep already setup */ > > % if (request->flags & ATA_R_REQUEUE) > > % return; > > % > > % /* if this is not a callback and we havn't seen DONE yet -> sleep */ > > % if (!request->callback) { > > % while (!(request->flags & ATA_R_DONE)) > > % tsleep(request, PRIBIO, "atareq", hz/10); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > % } > > % } > > > > when it is called from an interrupt handler. It is called from an interrupt > > handler as part of timeout processing: > > > > ... > > msleep(...) > > ata_queue_request(...) > > ata_via_family_setmode(...) > > ata_identify_devices(...) > > ata_reinit(...) > > ata_timeout(...) > > softclock(...) > > ithread_loop(...) > > ... > > > > The timeout was called here shortly after ad2 hung: > > > > Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) > > Dec 25 12:28:27 besplex kernel: ata1: resetting devices .. > > Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device > > Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed > > > > ATA_R_DONE was never set and wakeup_request() was never called either, so > > softclock() was deadlocked and tsleep() never returned. > > > > The system ran surprisingly well with softclock() deadlocked. ad0 worked > > and everything that didn't use timeouts worked. Examples of things that > > didn't work because they use timeouts: > > - syscons screen updates. > > - statistics in top and systat. > > - sleep 1 in shells. > > - mbmon (shows the status, then never repeats). > > > > I tried the following to recover: > > - call wakeup(request) using ddb. This worked, but ATA_R_DONE was never > > set so ata_queue_request() just looped. > > - also ignore the ATA_R_DONE check using ddb. This un-deadlocked > > softclock(), but ata1 remained wedged. > > - then call "atacontrol reinit 1". This partly worked: > > > > Dec 25 14:31:12 besplex kernel: ata1: resetting devices .. > > Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration > > Dec 25 14:31:44 besplex kernel: done > > > > but the ata driver caused a null pointer panic an instant later. > > > > - ad2 didn't come back after a hard reset. > > - ad2 came back after a power cycle. > > > > This was on an undermydesktop. Problems resuming on laptops may be similar. > > The hardware may really be wedged. Then the software shouldn't make things > > worse by sleeping or spinning in the timeout handler. > > > > Bruce > > > > -Søren > Yes I know it works under windows!! > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >