From owner-freebsd-current@FreeBSD.ORG Mon Aug 2 06:21:31 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A02516A4CE for ; Mon, 2 Aug 2004 06:21:31 +0000 (GMT) Received: from www.cryptography.com (li-22.members.linode.com [64.5.53.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA1C543D49 for ; Mon, 2 Aug 2004 06:21:30 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.5.50] (adsl-64-171-186-94.dsl.snfc21.pacbell.net [64.171.186.94]) by www.cryptography.com (8.12.8/8.12.8) with ESMTP id i726LKra030144; Sun, 1 Aug 2004 23:21:21 -0700 Message-ID: <410DDD60.4090903@root.org> Date: Sun, 01 Aug 2004 23:21:20 -0700 From: Nate Lawson User-Agent: Mozilla Thunderbird 0.7 (X11/20040702) X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?ISO-8859-1?Q?S=F8ren_Schmidt?= References: <410AD054.8070202@root.org> <410B61F2.90504@DeepCore.dk> In-Reply-To: <410B61F2.90504@DeepCore.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: current@freebsd.org Subject: Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Aug 2004 06:21:31 -0000 Søren Schmidt wrote: > Nate Lawson wrote: > >> I've tracked down the source of the memory corruption in -current that >> results when booting with various CD and DVD drives (especially the ones >> that come with Thinkpads including T23, R32, T41, etc.) The panic is >> obvious when running with INVARIANTS ("memory modified after free") but >> not so obvious in other configurations. For instance, without >> INVARIANTS, part of the rt_info structure is corrupted on my wireless >> card, resulting in a panic during ifconfig on boot. This is likely the >> source of other problems, including phk's ACPI panic (again, only >> triggered when booting with the CD drive in the bay.) > > OK, first thanks for digging into this! The other major bug I'm hunting is the EISA probe hanging. I want 5.3 to be reliable. >> The root problem is that ata_timeout() fires and calls ata_pio_read() >> which overwrites 512 bytes random memory. There are actually two bugs >> here that overwrite memory. The code path is as follows: >> >> 1. ata runs an IDENTIFY command on each drive. It reaches this stack: >> >> ata_getparam() >> ata_identify_devices() >> ata_boot_attach() >> >> 2. ata_getparam() allocates a request and runs it: >> ata_alloc_request() >> loop on retries (2 max) >> fill out an immediate read request for 512 bytes (DEV_BSIZE) >> *** Bug 1: transfersize is 512 bytes but sizeof(struct ata_request) is >> much less (~80 bytes). >> ata_queue_request() starts the request and arms a timeout > > Its correct to allocate via ata_alloc_request, the data is *not* put > into the request, but into another memory area (atadev->param) so this > is not a bug. You're correct. Sorry for reading this wrong. I earlier made a note to check the size of atadev->param and mistakenly checked sizeof(ata_request). The params are indeed the right size (512 bytes) and the size in ata.h is noted in 16 bit half words. You might want a CTASSERT that DEV_BSIZE == sizeof(ata_params). -Nate