From owner-freebsd-hackers Fri Mar 5 12:15:14 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from midten.fast.no (midten.fast.no [195.139.251.11]) by hub.freebsd.org (Postfix) with ESMTP id 165E415151 for ; Fri, 5 Mar 1999 12:14:57 -0800 (PST) (envelope-from tegge@fast.no) Received: from fast.no (IDENT:tegge@midten.fast.no [195.139.251.11]) by midten.fast.no (8.9.1/8.9.1) with ESMTP id VAA18648; Fri, 5 Mar 1999 21:14:35 +0100 (CET) Message-Id: <199903052014.VAA18648@midten.fast.no> To: sthaug@nethelp.no Cc: stephw@xs4all.nl, freebsd-hackers@FreeBSD.ORG Subject: Re: adaptec 2940u2w hangs on external disks From: Tor.Egge@fast.no In-Reply-To: Your message of "Fri, 05 Mar 1999 17:02:35 +0100" References: <22812.920649755@verdi.nethelp.no> X-Mailer: Mew version 1.70 on Emacs 19.34.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Fri, 05 Mar 1999 21:14:35 +0100 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > A possibly related problem we've seen here: FreeBSD sometimes needs a > hard reset (hit reset button) to reboot, while a software reboot will > hang during bootup. > > This happens on FreeBSD boxes with 3.1R or 3.1-STABLE, onboard Adaptec > 7890 U2W controller. Various (Seagate, IBM) LVD disks on LVD chain, > *and* DAT on single-ended chain. Using verbose boot, we see that the > hang occurs while probing the DAT and/or the CDROM player on the SE > chain. I've also noticed this problem. Having more disks on a machines gives a higher probability of a hang. I'm using a serial console and options BREAK_TO_DEBUGGER in the kernel config file. Sending a break to enter the kernel debugger does not work when the hang has occured. The virtual NMI pushbutton does not work when the hang has occured (SMP kernel, IOAPIC reprogrammed to treat irq 3 as NMI to be delivered to CPU#0). Programming CPU#1 to run with interrupts disabled (and lapic.tpr set to 255) sending about 100K IPIs/second to CPU#0 for sampling the program counter at CPU#0 does not help. CPU#1 stops running when the hang occurs: e0122535 -> scsi_interpret_sense+0x1 e011e59c -> xpt_release_devq+0x4 e012ff21 -> ahc_action+0x1 e021a002 -> splcam+0x46 e01306a1 -> ahc_action+0x781 e011f983 -> xpt_set_transfer_settings+0x7b e01302fb -> ahc_action+0x3db e012b501 -> ahc_find_syncrate+0x1 e012b6ea -> ahc_update_target_msg_request+0xce HANG e0122586 -> scsi_interpret_sense+0x52 e014725e -> free+0x3a e0130462 -> ahc_action+0x542 e011f923 -> xpt_set_transfer_settings+0x1b e022aefa -> strncpy+0x16 e011f9e3 -> xpt_set_transfer_settings+0xdb e0219fbc -> splcam+0x0 e012b501 -> ahc_find_syncrate+0x1 e012b6e7 -> ahc_update_target_msg_request+0xcb HANG Going back to an UP kernel, adding limited debug output has resulted in the following reconstructed call stacks when the hang occurs: camisr probedone xpt_action xpt_set_transfer_settings ahc_action ahc_set_width ahc_update_target_msg_request unpause_sequencer and camisr probedone xpt_action xpt_set_transfer_settings ahc_action ahc_set_syncrate ahc_update_target_msg_request unpause_sequencer where ahc_inb(ahc, INTSTAT) in unpause_sequencer seems to hang. removing AHC_ALLOW_MEMIO from the kernel configuration file caused the hang to occur a few lines earlier (while setting new value for TARGET_MSG_REQUEST or TARGET_MSG_REQUEST + 1). I'm using a modified splvm (which blocks cam interrupts) and a modified splsoftcam (which blocks cam interrupts during device probing), but this does not prevent the hangs. - Tor Egge To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message