From owner-freebsd-hardware Thu Oct 22 12:28:35 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA14214 for freebsd-hardware-outgoing; Thu, 22 Oct 1998 12:28:35 -0700 (PDT) (envelope-from owner-freebsd-hardware@FreeBSD.ORG) Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA14193; Thu, 22 Oct 1998 12:28:31 -0700 (PDT) (envelope-from ken@panzer.plutotech.com) Received: (from ken@localhost) by panzer.plutotech.com (8.9.1/8.8.5) id NAA17486; Thu, 22 Oct 1998 13:27:30 -0600 (MDT) From: "Kenneth D. Merry" Message-Id: <199810221927.NAA17486@panzer.plutotech.com> Subject: Re: Still freeze with 3.0-RELEASE, PLEASE give me any suggestions!! In-Reply-To: <199810221222.OAA01073@odie.lippe.de> from =?ISO-8859-1?Q?Lars_K=F6ller?= at "Oct 22, 98 02:22:26 pm" To: lkoeller@cc.fh-lippe.de (Lars =?iso-8859-1?Q?K=F6ller?=) Date: Thu, 22 Oct 1998 13:27:30 -0600 (MDT) Cc: freebsd-hardware@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL28s (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-hardware@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Lars Köller wrote... > -------- > > Hello again! > > I've just upgraded my 3.0CAM SNAP to 3.0-RELEASE with no problems. > But there are still a freeze of the system soon after I've enabled > X11. This is not the kind of a proof X11 is responsibe for the > problems, but more like an intuition. > > Again, the hole system is running very very stable with 2.2.7!! > > The hardware is a Tyan Titan Pro with 2x200 MHz PPro and 64MB RAM, > Matrox Millenium (8MB). (Xserver version, etc. see attachment). > > I also change the BIOS values to slower RAM access and disable > some features, no change. Andreas Klemm which own the same Board has > no such freezes with the same BIOS settings! I've also reserved > IRQ 15 for the video card (could be set in the BIOS) cause else it's > occupied by the Adaptec 2940: > > ahc1 rev 0 int a irq 15 on pci0:13:0 > > Again no change (after this IRQ 15 is occupied by the vga device and > the Adaptec is on IRQ 2)! > > The system freezes with both SMP and NO-SMP, I've also changed the > RAM, the PPro slot 0->1, 1->0, no change at all. 2.2.7-RELEASE is stable, > 3.0 freezes with no message on the console, until today!!! > > I've get the following panic: > > kernel: type 12 trap, code = 0 > Stopped at _dasendorderedtag+0x15: cmpl $0,0xb4(%ebx) > > db> show registers > > cs 0x8 > ds 0x582a0010 > es 0xf01e0010 _vid_set_border+0xb8 > ss 0x10 > eax 0xc0000000 > ecx 0xf1c9ec38 > edx 0 > ebx 0x306c > esp 0xf01ecf8c _etext+0x2b4c > ebp 0xf01ecf90 _etext+0x2b50 > esi 0xf01f1f8? _dasendorderedtag (sorry, address wrong noted) > edi 0xc0000000 > eip 0xf010f20d _dasendorderedtag+0x15 > efl 0x10286 > > The hole upgrade/install (aout to elf) was done with X11 disabled. > > Any suggestions are welcome! Generally, a stack trace is more helpful than a register dump. But, I think I've got an idea of what your problem is. It looks like one of your tape drives is getting confused. Try increasing your bus settle delay from 8 seconds to 15 seconds. The messages you attached show two boots. In the first one (probably after poweron) there are a number of error messages. The second one looks fine. What happened is that one of your tape drives responded on multiple LUNs in the first boot, probably because it didn't have enough time to properly initialize itself. In any case, the inquiry information that came back was bogus, and the device type number was 0. So the da driver tried to attach to the device in question. When the da driver tried to attach, the drive sent back a message saying that the particular logical unit (in this case, 3) wasn't supported: Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): READ CAPACITY. CDB: 25 60 0 0 0 0 0 0 0 0 Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): ILLEGAL REQUEST asc:25,0 Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): Logical unit not supported Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): fatal error, failed to attach to device(da4:ahc0:0:6:3): removing device entry The da driver then tried to de-register that peripheral instance. The problem is that there's a bug in the da driver w.r.t. invalidating peripheral instances from the probe/attach code. I've actually been working on a fix for that bug since a co-worker discovered it on Tuesday. What happens is that when the da driver invalidates a peripheral instance from dadone(), that peripheral instance doesn't get removed from the list of da softc's. That list of softc's is traversed every so often by the dasendorderedtag() function, which is called from a timeout handler. When the da peripheral in question is removed, its softc is freed. Next time the dasendorderedtag() is called, the kernel panics because it dereferences a pointer to nowhere when traversing the linked list of softc's. Anyway, try increasing SCSI_DELAY in your kernel from 8000 (8 seconds) to 15000 (15 seconds) and see if that fixes the problem. If that doesn't work, you can try disabling multi-lun probing for your HP DAT drive. I'll probably check in my patches to fix the panic in the next couple of days. That isn't the root cause of your problem, though. I think one of the above two solutions should fix it. Ken -- Kenneth Merry ken@plutotech.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hardware" in the body of the message