Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Aug 2004 12:08:13 -0700
From:      Nate Lawson <nate@root.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        sos@deepcore.dk
Subject:   Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt")
Message-ID:  <4111341D.7050106@root.org>
In-Reply-To: <200407310013.i6V0DI9D085697@apollo.backplane.com>
References:  <410AD054.8070202@root.org> <200407310013.i6V0DI9D085697@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :I've tracked down the source of the memory corruption in -current that
> :results when booting with various CD and DVD drives (especially the ones
> :that come with Thinkpads including T23, R32, T41, etc.)  The panic is
> :..
> 
>     Nick, 

s/Nick/Nate and not either of the Williams ones (Net and FreeBSD).  :)

>     what about the retry code in ata_completed()? (ata-queue.c 229).
>     Does it need to reset donecount as well?  Both the code in 5.x and 
>     the code in 4.x looks 'dangerous' with regards to general retries.

Hmm, it seems like this could be a problem with requests that are
re-queued.  It's likely that donecount was never incremented in the
error cases but I don't know the code well enough to say this.  There is
also the question of whether it's ok to retry a request in immediate
mode that previously was done as a queued request.

I'll let Soeren address this.

>     The 5.x code seems to handle retries generically via 
>     ata_finish()->ata_completed()->(retry handling), and this seems to
>     include IMMEDIATE requests, and it does not appear to reset the 
>     donecount when it requeues.
> 
>     The 4.x code seems to handle retries in ad_timeout() and ad_interrupt()
>     (and doesn't reset donecount in either case as far as I can tell),
>     and the 4.x code's addump() seems to rely on donecount in its transfer
>     loop (but I do not see any similar reliance in the 5.x code).

I don't see any obvious problems here but looking into error handling or
other uncommon paths is usually a good way to find latent issues.

-Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4111341D.7050106>