From owner-freebsd-current@FreeBSD.ORG Wed Aug 4 19:08:36 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4EA3216A4CE for ; Wed, 4 Aug 2004 19:08:36 +0000 (GMT) Received: from www.cryptography.com (li-22.members.linode.com [64.5.53.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A1B243D5E for ; Wed, 4 Aug 2004 19:08:36 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.0.34] (adsl-67-127-84-57.dsl.snfc21.pacbell.net [67.127.84.57]) by www.cryptography.com (8.12.8/8.12.8) with ESMTP id i74J8H2a003141; Wed, 4 Aug 2004 12:08:18 -0700 Message-ID: <4111341D.7050106@root.org> Date: Wed, 04 Aug 2004 12:08:13 -0700 From: Nate Lawson User-Agent: Mozilla Thunderbird 0.7 (X11/20040702) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Matthew Dillon References: <410AD054.8070202@root.org> <200407310013.i6V0DI9D085697@apollo.backplane.com> In-Reply-To: <200407310013.i6V0DI9D085697@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: current@freebsd.org cc: sos@deepcore.dk Subject: Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2004 19:08:36 -0000 Matthew Dillon wrote: > :I've tracked down the source of the memory corruption in -current that > :results when booting with various CD and DVD drives (especially the ones > :that come with Thinkpads including T23, R32, T41, etc.) The panic is > :.. > > Nick, s/Nick/Nate and not either of the Williams ones (Net and FreeBSD). :) > what about the retry code in ata_completed()? (ata-queue.c 229). > Does it need to reset donecount as well? Both the code in 5.x and > the code in 4.x looks 'dangerous' with regards to general retries. Hmm, it seems like this could be a problem with requests that are re-queued. It's likely that donecount was never incremented in the error cases but I don't know the code well enough to say this. There is also the question of whether it's ok to retry a request in immediate mode that previously was done as a queued request. I'll let Soeren address this. > The 5.x code seems to handle retries generically via > ata_finish()->ata_completed()->(retry handling), and this seems to > include IMMEDIATE requests, and it does not appear to reset the > donecount when it requeues. > > The 4.x code seems to handle retries in ad_timeout() and ad_interrupt() > (and doesn't reset donecount in either case as far as I can tell), > and the 4.x code's addump() seems to rely on donecount in its transfer > loop (but I do not see any similar reliance in the 5.x code). I don't see any obvious problems here but looking into error handling or other uncommon paths is usually a good way to find latent issues. -Nate