Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Mar 2004 18:01:45 +0200
From:      Evren Yurtesen <yurtesen@ispro.net.tr>
To:        "David A. Koran" <dak@solo.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Same Panic 12 on differnet servers
Message-ID:  <40435E69.9000301@ispro.net.tr>
In-Reply-To: <40435588.6010604@solo.net>
References:  <40434197.8060100@solo.net> <40434F23.7070608@ispro.net.tr> <40435588.6010604@solo.net>

next in thread | previous in thread | raw e-mail | index | archive | help
David A. Koran wrote:

> however, I'm remtoe an can't pick up the consoel messages (if somebody's 
> got a cool trick for that, I'd appreciate it). I'm usually recording 

Well, I dont know any other way than plugging some other machine to 
console port via serial and then record everything. Since in a panic 
situation it is most probably that the machine cant write anything to 
logs or to disk at all.

> I cvsup about once or twice a day. I build world about once a week and 
> upgrade ports daily. I'm going through a portsupgrade rihgt now and will 

But you said the machine was on for 80+ days before it crashed before? 
How could you upgrade the system and kernel once a week?
Perhaps the working sources for you has been at least 80 days ago and 
your problem might be related to anything happened to freebsd sources in 
between?

> The hardware is fine and has been working without a hitch. And, for the 
> case that I'm not sure EXACTLY when the last stable build ocurred (i can 
> look at my saved daily logs for repeated reboots), I'm not going to have 
> much to go on right now. I was mor eor less soliciting any me-toos to 
> see if we can pin-point the issue. I'll post back on the progress of 
> finding out when this ocurred (or started to at least).

Well, I had once a problem with a machine which was working more than a 
month without any problem. Then it started rebooting etc. after I made 
world. I thought its a software problem but later on I realized that 2 
of the memory modules were faulty. I would guess they just got bad about 
the same time when I cvsupped to newer sources. We shouldnt exclude the 
possibility of some hardware failure which cause a program to 
malfunction. When you are around the hardware you should perhaps try 
some memory test and hard drive tests. www.memtest86.com has a nice 
utility and I guess your drives would support S.M.A.R.T. testing. Its a 
shame that 4.x versions of freebsd cant work with smartmontools with ATA 
drives, otherwise you could do the test on the fly. I am sure there is 
an utility which can do SMART tests with a bootable floppy etc. though. 
I never used one. Hmm lets see ;) *googling* Well IBM/Hitachi seems to 
have a software. I dont know if it will work on your WD drives but its 
worth a shot.
http://www.hgst.com/hdd/support/download.htm
Even if your drives are not the problem, looking the drive status doesnt 
hurt anything. I recently realized few of my drives are gonna fail 
pretty soon. It is just nice to know before that really happens.
You should do the extended long smart test on the drive. It is the best 
test which can detect any possible failures.


>> Which process is using the cpu so much before crashing? 
>  
> This is post crash diagnostics, so, I'm not process monitoring yet.
>
>>> balanced combination of web and mail server on it. The load used to
>>> (and with some tuning) stays below 1.00 load, but I've seen it get to
>>> above 3.00 and start crashing.

Well I just thought you would know, because you said the load gets up to 
3.00 before crashing... So you didnt check what was using so much CPU?

> I have a ton of apps on the machine (it's a loaded webserver and mail 
> server, most of the laod comes from SPAM and Virus scanning of incomign 
> e-mail right now).. so pin-pointing the offending app right now will 
> probably take more work.

Well, the problem might be your spam/virus scanning software also. 
Nowadays there are so many mail worms that when they start attacking, 
you would receive hundreds of emails at once. That might cause you to 
run out of memory etc. and use a lot of processor power and swap space 
also! The access to the machine would get really slow. Then it might 
eventually cause a crash/reboot situation...This is just another 
possibility.

> Just this one (my backup test box [read: laptop] is out for hardware 
> maintenance... FreeBSD 5.x kept dying on it... urf!)

Well the subject said 'on different servers' so I thought you have 
multiple servers having the same issue.

Evren




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40435E69.9000301>