Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Oct 1998 13:27:30 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        lkoeller@cc.fh-lippe.de (Lars =?iso-8859-1?Q?K=F6ller?=)
Cc:        freebsd-hardware@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG
Subject:   Re: Still freeze with 3.0-RELEASE, PLEASE give me any suggestions!!
Message-ID:  <199810221927.NAA17486@panzer.plutotech.com>
In-Reply-To: <199810221222.OAA01073@odie.lippe.de> from =?ISO-8859-1?Q?Lars_K=F6ller?= at "Oct 22, 98 02:22:26 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Lars Köller wrote...
> --------
> 
> Hello again!
> 
> I've just upgraded my 3.0CAM SNAP to 3.0-RELEASE with no problems. 
> But there are still a freeze of the system soon after I've enabled 
> X11. This is not the kind of a proof X11 is responsibe for the 
> problems, but more like an intuition.
> 
> Again, the hole system is running very very stable with 2.2.7!!
> 
> The hardware is a Tyan Titan Pro with 2x200 MHz PPro and 64MB RAM, 
> Matrox Millenium (8MB). (Xserver version, etc. see attachment).
> 
> I also change the BIOS values to slower RAM access and disable 
> some features, no change. Andreas Klemm which own the same Board has 
> no such freezes with the same BIOS settings! I've also reserved 
> IRQ 15 for the video card (could be set in the BIOS) cause else it's 
> occupied by the Adaptec 2940:
> 
> ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 15 on pci0:13:0
> 
> Again no change (after this IRQ 15 is occupied by the vga device and 
> the Adaptec is on IRQ 2)! 
> 
> The system freezes with both SMP and NO-SMP, I've also changed the 
> RAM, the PPro slot 0->1, 1->0, no change at all. 2.2.7-RELEASE is stable, 
> 3.0 freezes with no message on the console, until today!!!
> 
> I've get the following panic:
> 
> kernel: type 12 trap, code = 0
> Stopped at		_dasendorderedtag+0x15:	cmpl	$0,0xb4(%ebx)
> 
> db> show registers
> 
>  cs	       0x8
>  ds	0x582a0010
>  es	0xf01e0010	_vid_set_border+0xb8
>  ss	      0x10
> eax	0xc0000000
> ecx	0xf1c9ec38
> edx	         0
> ebx	    0x306c
> esp	0xf01ecf8c	_etext+0x2b4c
> ebp	0xf01ecf90	_etext+0x2b50
> esi	0xf01f1f8?	_dasendorderedtag       (sorry, address wrong noted)
> edi	0xc0000000
> eip	0xf010f20d	_dasendorderedtag+0x15
> efl	   0x10286
> 
> The hole upgrade/install (aout to elf) was done with X11 disabled.
> 
> Any suggestions are welcome!

Generally, a stack trace is more helpful than a register dump.  But, I
think I've got an idea of what your problem is.

It looks like one of your tape drives is getting confused.  Try increasing
your bus settle delay from 8 seconds to 15 seconds.

The messages you attached show two boots.  In the first one (probably after
poweron) there are a number of error messages.  The second one looks fine.

What happened is that one of your tape drives responded on multiple LUNs in
the first boot, probably because it didn't have enough time to properly
initialize itself.  In any case, the inquiry information that came back
was bogus, and the device type number was 0.  So the da driver tried to
attach to the device in question.

When the da driver tried to attach, the drive sent back a message saying
that the particular logical unit (in this case, 3) wasn't supported:

Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): READ CAPACITY. CDB: 25 60 0 0 0 0 0 0 0 0
Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): ILLEGAL REQUEST asc:25,0
Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): Logical unit not supported
Oct 22 12:23:54 odie /kernel: (da4:ahc0:0:6:3): fatal error, failed to attach to device(da4:ahc0:0:6:3): removing device entry

The da driver then tried to de-register that peripheral instance.  The
problem is that there's a bug in the da driver w.r.t. invalidating
peripheral instances from the probe/attach code.  I've actually been
working on a fix for that bug since a co-worker discovered it on Tuesday.

What happens is that when the da driver invalidates a peripheral instance
from dadone(), that peripheral instance doesn't get removed from the list
of da softc's.  That list of softc's is traversed every so often by the
dasendorderedtag() function, which is called from a timeout handler.
When the da peripheral in question is removed, its softc is freed.  Next
time the dasendorderedtag() is called, the kernel panics because it
dereferences a pointer to nowhere when traversing the linked list of
softc's.

Anyway, try increasing SCSI_DELAY in your kernel from 8000 (8 seconds) to
15000 (15 seconds) and see if that fixes the problem.  If that doesn't
work, you can try disabling multi-lun probing for your HP DAT drive.

I'll probably check in my patches to fix the panic in the next couple of
days.  That isn't the root cause of your problem, though.  I think one of
the above two solutions should fix it.

Ken
-- 
Kenneth Merry
ken@plutotech.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hardware" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810221927.NAA17486>