From owner-freebsd-stable Mon Dec 2 21:17:27 1996 Return-Path: owner-stable Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA09092 for stable-outgoing; Mon, 2 Dec 1996 21:17:27 -0800 (PST) Received: from dfw-ix7.ix.netcom.com (dfw-ix7.ix.netcom.com [206.214.98.7]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id VAA09087; Mon, 2 Dec 1996 21:17:25 -0800 (PST) Received: from silvia.HIP.Berkeley.EDU (silvia.HIP.Berkeley.EDU [136.152.64.181]) by dfw-ix7.ix.netcom.com (8.6.13/8.6.12) with ESMTP id VAA20307; Mon, 2 Dec 1996 21:16:52 -0800 Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.8.3/8.6.9) id VAA27205; Mon, 2 Dec 1996 21:16:50 -0800 (PST) Date: Mon, 2 Dec 1996 21:16:50 -0800 (PST) Message-Id: <199612030516.VAA27205@silvia.HIP.Berkeley.EDU> To: stable@freebsd.org CC: gibbs@freebsd.org Subject: 2.1.6 (sort of) ahc problem From: asami@freebsd.org (Satoshi Asami) Sender: owner-stable@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've had a -stable box (around 2.1.6) crash three times today under heavy NFS load. (But it's only a 10BaseT network so it's not THAT heavy....) Here's one: === ## gdb -k kernel.4 vmcore.4 GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.13 (i386-unknown-freebsd), Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)... IdlePTD 1e0000 current pcb at 1d2ddc panic: free: multiple frees #0 0xf019c10b in boot () (kgdb) bt #0 0xf019c10b in boot () #1 0xf0116d53 in panic () #2 0xf010fc93 in free () #3 0xf0121d82 in m_freem () #4 0xf0151e18 in nfsrv_read () #5 0xf01601b3 in nfssvc_nfsd () #6 0xf015faba in nfssvc () #7 0xf01a4246 in syscall () #8 0xf01998eb in Xsyscall () Cannot access memory at address 0xefbfde70. === and another one: === ## echo "bt" | gdb -k kernel.5 vmcore.5 GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.13 (i386-unknown-freebsd), Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)... IdlePTD 1e0000 current pcb at 1d2ddc panic: free: multiple frees #0 0xf019c10b in boot () (kgdb) #0 0xf019c10b in boot () #1 0xf0116d53 in panic () #2 0xf010fc93 in free () #3 0xf0121d82 in m_freem () #4 0xf0143526 in ipintr () #5 0xf019ad6d in swi_net_next () #6 0xf019a94d in Xresume11 () #7 0xf012938f in biowait () #8 0xf0127ae7 in bread () #9 0xf0180d05 in ffs_update () #10 0xf0183022 in ffs_sync () #11 0xf012e062 in sync () #12 0xf019c005 in boot () #13 0xf0116d53 in panic () #14 0xf010fc93 in free () #15 0xf0121d82 in m_freem () #16 0xf0151e18 in nfsrv_read () #17 0xf01601b3 in nfssvc_nfsd () #18 0xf015faba in nfssvc () #19 0xf01a4246 in syscall () #20 0xf01998eb in Xsyscall () #21 0x10d3 in ?? () === It seems to die right after messages like this: === Dec 2 18:01:03 stampede /kernel: sd1(ahc1:8:0): timed out in message out phase, SCSISIGI == 0xb6 Dec 2 18:01:03 stampede /kernel: Ordered Tag queued Dec 2 18:01:08 stampede /kernel: sd1(ahc1:8:0): timed out in message out phase, SCSISIGI == 0xb6 === This is a P6-200 system with 6 8GB disks (Ultra-Wide, Ultra mode disabled) on a single channel of 3940UW. Sometimes it won't crash: === Dec 1 13:10:10 stampede /kernel: sd6(ahc1:13:0): timed out in dataout phase, SC SISIGI == 0x0 Dec 1 13:10:10 stampede /kernel: Ordered Tag queued Dec 1 13:10:15 stampede /kernel: sd6(ahc1:13:0): timed out in dataout phase, SC SISIGI == 0x0 Dec 1 13:10:15 stampede /kernel: ahc1: Issued Channel A Bus Reset #1. 1 SCBs ab orted Dec 1 13:10:15 stampede /kernel: sd6(ahc1:13:0): UNIT ATTENTION asc:29,0 Dec 1 13:10:15 stampede /kernel: sd6(ahc1:13:0): Power on, reset, or bus devic e reset occurred Dec 1 13:10:15 stampede /kernel: , retries:3 Dec 1 13:10:15 stampede /kernel: sd6(ahc1:13:0): NOT READY asc:4,1 Dec 1 13:10:15 stampede /kernel: sd6(ahc1:13:0): Logical unit is in process of becoming ready Dec 1 13:10:15 stampede /kernel: , retries:2 : === (it recovered after a few hundred of these "NOT READY" lines.) I guess Justin is going to update the ahc code soon, but just FYI. Satoshi