Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jan 2001 10:52:26 -0800
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Soren Schmidt <sos@freebsd.dk>
Cc:        Randell Jesup <rjesup@wgate.com>, arch@FreeBSD.ORG, current@FreeBSD.ORG
Subject:   Re: HEADS-UP: await/asleep removal imminent
Message-ID:  <20010117105226.V7240@fw.wintelcom.net>
In-Reply-To: <200101171842.TAA12276@freebsd.dk>; from sos@freebsd.dk on Wed, Jan 17, 2001 at 07:42:26PM %2B0100
References:  <20010117101342.R7240@fw.wintelcom.net> <200101171842.TAA12276@freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
* Soren Schmidt <sos@freebsd.dk> [010117 10:43] wrote:
> It seems Alfred Perlstein wrote:
> > > 
> > > I suggest creative manpower is used to stabilize -current, instead
> > > of fine trimming which API's should stay or not...
> > 
> > I started a loop of make -j128 buildworld and buildkernel last
> > night, I still haven't seen anything odd happen on my hardware.
> > 
> > You and Poul-Henning have to figure out what's going on, no one
> > else is able to reproduce this instability you're talking about.
> 
> Oohh you dont read the mailing lists then, there has been plenty
> of reports of hanging -current boxen since SMPng...

Yes, but none with anything useful. :(

> > There has to be a way for you guys to get us some reasonable
> > tracebacks or diagnostics instead of just saying "it's broke".
> 
> Its close to impossible, the two symptoms I see here are either
> spontanous reboots, or solid hangs where only a reset can get
> you out, so I cant say much other than "it's broke".

You probably have a much better understanding of low level programming
than I do, you _should_ be able to figure out what's going on.

> > Perhaps you can explain how you're able to trigger this instability
> > with a test script?  Poul-Henning told me he just needed to do a
> > make -j256 world, I did 10 of them without a problem...
> 
> Hmm, with a -current kernel from today 1200 CET i just need to
> do a make depend on a GENERIC kernel, and wham it locks up.

Odd, doesn't hang for me.

> > I'd also like to see what hardware you guys are running on and what
> > kernel config.  I'm pretty sure that running with a weird value
> > for HZ causes lockups on -stable, dunno about current.
> 
> Nothing special, GENERIC kernel with SMP defined will do nicely, running
> without SMP improves matters but on the fastet machine I'm still getting
> lockups, but they are rare...
> 
> Hardware it hangs on here include:
> 
> 2*PPro@200 192MB FX chipset ATA disks on onboard controller (PIIX3)
> 
> 2*PII@350 512MB BX chipset SCSI disks on NCR controller
> 
> 2*PIII@1G 512MB ServerWorks chipset ATA disks on onboard + HPT controller.
> 
> It seems the faster the machine the faster the lockup/hang..
> 
> Need I mention that they all work just fine(tm) under -stable and
> -current back on PRE_SMPNG...
> 
> So, we (phk & I) are trying to figure out what is going on, but
> there is little to go on but hunch...
> So there is nothing special to it guys, you just have to try..
> Oh btw using a ccd/vinum/ATA-raid thingy makes the problem worse,
> probably due to the higher interrupt rates.

I will try stacking a vinum over vn striped setup later tonight
to see if this still locks up.

You're still not telling me what combination of vn/vinum does this,
so I guess I'll have to stumble around in the dark for a bit until
I find the magic combination to find the Danish panic/lockup?

I think phk just told me that you need a UP kernel to find this,
but he's being pretty vague about it so I don't know.

> > Basically if you're expecting me or the SMP team to figure out
> > what's going on without more info, you're pretty much out of luck.
> 
> See above, not really possible, we have been trying to find some
> (affordable) HW that could be used to preserve a log over a boot,
> but so far I havn't been able to find anything that works, and
> is fast enough to not effect the system too much...
> 
> > ...wondering if the box Paul Saab gave me is actually SMP... :)
> 
> Yup, that would explain things :)

Well, I do see processes migrating from CPU to CPU and there's the
dmesg:

FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  5, version: 0x000f0011, at 0xfec01000
SMP: AP CPU #1 Launched!
SMP: CPU1 apic_initialize():
     lint0: 0x00010700 lint1: 0x00010400 TPR: 0x00000010 SVR: 0x000001ff
start_init: trying /sbin/init

Dual 750mhz, 1GB RAM, atapci0: <ServerWorks ROSB4 ATA33 controller>
dual disks: ad0: <IBM-DTLA-307030/TX4OA50C> ATA-5 disk at ata0-master

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010117105226.V7240>