Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Dec 1996 21:16:50 -0800 (PST)
From:      asami@freebsd.org (Satoshi Asami)
To:        stable@freebsd.org
Cc:        gibbs@freebsd.org
Subject:   2.1.6 (sort of) ahc problem
Message-ID:  <199612030516.VAA27205@silvia.HIP.Berkeley.EDU>

next in thread | raw e-mail | index | archive | help
I've had a -stable box (around 2.1.6) crash three times today under
heavy NFS load.  (But it's only a 10BaseT network so it's not THAT
heavy....)

Here's one:

===
## gdb -k kernel.4 vmcore.4
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.13 (i386-unknown-freebsd), 
Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)...
IdlePTD 1e0000
current pcb at 1d2ddc
panic: free: multiple frees
#0  0xf019c10b in boot ()
(kgdb) bt
#0  0xf019c10b in boot ()
#1  0xf0116d53 in panic ()
#2  0xf010fc93 in free ()
#3  0xf0121d82 in m_freem ()
#4  0xf0151e18 in nfsrv_read ()
#5  0xf01601b3 in nfssvc_nfsd ()
#6  0xf015faba in nfssvc ()
#7  0xf01a4246 in syscall ()
#8  0xf01998eb in Xsyscall ()
Cannot access memory at address 0xefbfde70.
===

and another one:

===
## echo "bt" | gdb -k kernel.5 vmcore.5
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.13 (i386-unknown-freebsd), 
Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)...
IdlePTD 1e0000
current pcb at 1d2ddc
panic: free: multiple frees
#0  0xf019c10b in boot ()
(kgdb) #0  0xf019c10b in boot ()
#1  0xf0116d53 in panic ()
#2  0xf010fc93 in free ()
#3  0xf0121d82 in m_freem ()
#4  0xf0143526 in ipintr ()
#5  0xf019ad6d in swi_net_next ()
#6  0xf019a94d in Xresume11 ()
#7  0xf012938f in biowait ()
#8  0xf0127ae7 in bread ()
#9  0xf0180d05 in ffs_update ()
#10 0xf0183022 in ffs_sync ()
#11 0xf012e062 in sync ()
#12 0xf019c005 in boot ()
#13 0xf0116d53 in panic ()
#14 0xf010fc93 in free ()
#15 0xf0121d82 in m_freem ()
#16 0xf0151e18 in nfsrv_read ()
#17 0xf01601b3 in nfssvc_nfsd ()
#18 0xf015faba in nfssvc ()
#19 0xf01a4246 in syscall ()
#20 0xf01998eb in Xsyscall ()
#21 0x10d3 in ?? ()
===

It seems to die right after messages like this:

===
Dec  2 18:01:03 stampede /kernel: sd1(ahc1:8:0): timed out in message out phase, SCSISIGI == 0xb6
Dec  2 18:01:03 stampede /kernel: Ordered Tag queued
Dec  2 18:01:08 stampede /kernel: sd1(ahc1:8:0): timed out in message out phase, SCSISIGI == 0xb6
===

This is a P6-200 system with 6 8GB disks (Ultra-Wide, Ultra mode
disabled) on a single channel of 3940UW.

Sometimes it won't crash:

===
Dec  1 13:10:10 stampede /kernel: sd6(ahc1:13:0): timed out in dataout phase, SC
SISIGI == 0x0
Dec  1 13:10:10 stampede /kernel: Ordered Tag queued
Dec  1 13:10:15 stampede /kernel: sd6(ahc1:13:0): timed out in dataout phase, SC
SISIGI == 0x0
Dec  1 13:10:15 stampede /kernel: ahc1: Issued Channel A Bus Reset #1. 1 SCBs ab
orted
Dec  1 13:10:15 stampede /kernel: sd6(ahc1:13:0): UNIT ATTENTION asc:29,0
Dec  1 13:10:15 stampede /kernel: sd6(ahc1:13:0):  Power on, reset, or bus devic
e reset occurred
Dec  1 13:10:15 stampede /kernel: , retries:3
Dec  1 13:10:15 stampede /kernel: sd6(ahc1:13:0): NOT READY asc:4,1
Dec  1 13:10:15 stampede /kernel: sd6(ahc1:13:0):  Logical unit is in process of
 becoming ready
Dec  1 13:10:15 stampede /kernel: , retries:2
 :
===

(it recovered after a few hundred of these "NOT READY" lines.)

I guess Justin is going to update the ahc code soon, but just FYI.

Satoshi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612030516.VAA27205>