Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Dec 2003 13:09:23 +0100
From:      Thomas Moestl <t.moestl@tu-bs.de>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        sparc@freebsd.org
Subject:   Re: panic: trap: memory address not aligned in ata_prtdev() with Nov 18 GENERIC
Message-ID:  <20031201120923.GA3276@timesink.dyndns.org>
In-Reply-To: <Pine.NEB.3.96L.1031130202153.66375k-100000@fledge.watson.org>
References:  <Pine.NEB.3.96L.1031130202153.66375k-100000@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Sun, 2003/11/30 at 20:29:09 -0500, Robert Watson wrote:
> Unfortunately, I didn't have dumps set up on this box.  On the other hand,
> given that the panic was in the ata code, perhaps I wouldn't have got a
> dump anyway.  This was with a November 18th GENERIC kernel on a blade100. 
> dmesg also below.  This appears to be highly reproduceable, and might be a
> property of the bgfsck running on the system.
> 
> [...]
> 
> db> show msgbuf
> msgbufp = 0xfffff80000407fe0
> magic = 63062, size = 32736, r= 4790, w = 4860, ptr = 0xfffff80000400000,
> cksum=
>  377365
> panic: trap: memory address not aligned
> cpuid = 0;
> Debugger("panic")
> ...
> db> trace
> panic() at panic+0x174
> trap() at trap+0x3b4
> -- memory address not aligned sfar=0xdedeadc0ee sfsr=0x40029
> %o7=0xc007eda8 --
> ata_prtdev() at ata_prtdev+0x14
> ata_timeout() at ata_timeout+0x130
> softclock() at softclock+0x1a0
> ithread_loop() at ithread_loop+0x1b8
> fork_exit() at fork_exit+0x84
> fork_trampoline() at fork_trampoline+0x8

This can happen when an ATA operation times out, and is caused by an
access to a freed structure. I have attached a workaround; IIRC sos is
developing a more complete fix for this.

ISTR the timeouts were caused by the fact that Blade 100s come with
ATA66-capable disks and controllers, but a non-ATA66 (40 pin) cable, and
that for some reason the driver check to catch this situation did not
work. I am not seeing this on my machine because I replaced the cable
long ago when I added another disk.

Can you confirm that your box does only have a 40 pin cable?

	- Thomas

-- 
Thomas Moestl <t.moestl@tu-bs.de>	http://www.tu-bs.de/~y0015675/
              <tmm@FreeBSD.org>		http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0  9C0F 1FE6 4F1D 419C 776C

--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ata-timo.diff"

Index: ata-queue.c
===================================================================
RCS file: /vol/ncvs/src/sys/dev/ata/ata-queue.c,v
retrieving revision 1.11
diff -u -r1.11 ata-queue.c
--- ata-queue.c	20 Oct 2003 14:28:37 -0000	1.11
+++ ata-queue.c	20 Nov 2003 00:56:48 -0000
@@ -316,6 +316,8 @@
 ata_timeout(struct ata_request *request)
 {
     struct ata_channel *ch = request->device->channel;
+    struct ata_device *reqdev = request->device;
+    char *reqstr = ata_cmd2str(request);
     int quiet = request->flags & ATA_R_QUIET;
 
     /* clear timeout etc */
@@ -324,10 +326,11 @@
     /* call hw.interrupt to try finish up the command */
     ch->hw.interrupt(request->device->channel);
     if (ch->running != request) {
+	/* request might already be freed - use copies. */
 	if (!quiet)
-	    ata_prtdev(request->device,
+	    ata_prtdev(reqdev,
 		       "WARNING - %s recovered from missing interrupt\n",
-		       ata_cmd2str(request));
+		       reqstr);
 	return;
     }
 

--zYM0uCDKw75PZbzx--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031201120923.GA3276>