From owner-freebsd-scsi@FreeBSD.ORG Thu Mar 23 18:59:24 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 46C6416A401 for ; Thu, 23 Mar 2006 18:59:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7078343D58 for ; Thu, 23 Mar 2006 18:59:23 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost (john@localhost [127.0.0.1]) by server.baldwin.cx (8.13.4/8.13.4) with ESMTP id k2NIxH1X031078; Thu, 23 Mar 2006 13:59:19 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: Oleg Sharoiko Date: Thu, 23 Mar 2006 11:46:27 -0500 User-Agent: KMail/1.9.1 References: <20060215102749.D58480@brain.cc.rsu.ru> <200603131056.09271.jhb@freebsd.org> <20060323092034.W795@brain.cc.rsu.ru> In-Reply-To: <20060323092034.W795@brain.cc.rsu.ru> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200603231146.30510.jhb@freebsd.org> X-Virus-Scanned: ClamAV 0.87.1/1354/Thu Mar 23 12:49:54 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-3.7 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: freebsd-scsi@freebsd.org, Andrey Beresovsky Subject: Re: Boot hangs on ips0: resetting adapter, this may take up to 5 minutes X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Mar 2006 18:59:24 -0000 On Thursday 23 March 2006 04:14, Oleg Sharoiko wrote: > Hi! > > On Mon, 13 Mar 2006, John Baldwin wrote: > > JB>> To make GENERIC usable it's enough to comment > JB>> options PREEMPTION > JB>> Not sure if this helps much. > JB>It could point to a bug in a driver. > > All this time I was doing experiments, but the more I did the less I > understood. Now I'd say that I suppose the problem is not with a > particular device, but rather with a number of devices installed in the > system. The things are different depending on hardware setup and kernel > configuration. Just a few examples: > > The only configuration which I've never seen failing was with no pci cards > installed and several devices disabled in BIOS (mouse, floppy, ata, serial > ata). This way the system boots fine with GENERIC kernel. As soon as I > install additional scsi card (adaptec 29160) SCB timeouts start happening > on internal scsi adapter during "Waiting 5 seconds for SCSI devices to > settle". The system would still boot after "ahd0: Recovery Initiated - > Card was not paused". If I remove bge driver from kernel (keeping > additional scsi in system) this timeouts go away. > > The GENERIC kernel on the system with no pci cards and all devices > enabled in BIOS sometimes boots and sometimes hangs with last line "lo0: > bpf attached". The same happens with kernel without bge with the exception > that for this one chances that it would boot are higher. > > When ips pci card is installed the GENERIC kernel would definitely hang > at boot. Kernel without bge would boot almost for sure. On SMP kernel I > was even able to kldload bge when boot have been completed. The same > action on UP system produces rather strange results. If I boot to > singleuser mode and load if_bge than the system returns to command prompt > and I can edit command line and everything looks normal. But as soon as I > try to execute something (I suppose disk io is a point here, but I'm not > sure) the system becomes extremely slow. It takes about 30 seconds to > print a single character on console. The same happens if I load if_bge in > multiuser mode. This points to an interrupt storm. > One thing is common to all cases: when system hangs (or becomes slow) > Ctrl+Alt+Esc wouldn't work, but sending break on com port still would and > it's possible to get into kernel debugger. Unfortunately this doesn't help > me. To be true I don't think I can cope with this on my own. I setup > remote gdb for this box but it gives nothing to me, due to lack of > knowledge on how interrupt delivery works and how interrupt handling is > done in FreeBSD. Would it be possible for you, John, or maybe for someone > else to look at this box. I can provide full remote access to it with > remote gdb, serial console and ip kvm. Can you drop into the debugger and do 'show intrcnt' after you have triggered the interrupt storm from bge? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org