Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jul 2008 13:48:33 +0200
From:      "Ronald Klop" <ronald-freebsd8@klop.yi.org>
To:        "Jo Rhett" <hostmaster@netconsonance.com>, "FreeBSD Stable" <freebsd-stable@freebsd.org>
Subject:   Re: how to get more logging from GEOM?
Message-ID:  <op.ud4lq7fo8527sy@guido.klop.ws>
In-Reply-To: <C278655C-4FFB-4A8E-9501-2B84283E324D@netconsonance.com>
References:  <C278655C-4FFB-4A8E-9501-2B84283E324D@netconsonance.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Jul 2008 09:59:33 +0200, Jo Rhett  
<hostmaster@netconsonance.com> wrote:

> About 10 days ago one of my personal machines started hanging at  
> random.  This is the first bit of instability I've ever experienced on  
> this machine (2+ years running)
>
> FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- 
> RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008      
> root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386
>
> After about 2 weeks of watching it carefully I've learned almost  
> nothing.  It's not a disk failure (AFAIK) it's not cpu overheat (now  
> running healthd without complaints) it's not based on any given network  
> traffic...  however it does appear to accompany heavy cpu/disk  
> activity.  It usually dies when indexing my websites at night (but not  
> always) and it sometimes dies when compiling programs.   Just heavy disk  
> isn't enough to do the job, as backups proceed without problems.   Heavy  
> cpu by itself isn't enough to do it either.  But if I start compiling  
> things and keep going a while, it will eventually hang.
>
> My best guess is that geom is having a problem and locking up.  There's  
> no log entry before failure to back this idea up, but I think this  
> because during boot I see the following:
>
> ad0: 286168MB <Seagate ST3300622A 3.AAH> at ata0-master UDMA100
> GEOM_MIRROR: Device gm0 created (id=575427344).
> GEOM_MIRROR: Device gm0: provider ad0 detected.
> ad1: 286168MB <Seagate ST3300622A 3.AAH> at ata0-slave UDMA100
> GEOM_MIRROR: Device gm0: provider ad1 detected.
> GEOM_MIRROR: Device gm0: provider ad1 activated.
> GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
> GEOM_MIRROR: Device gm0: rebuilding provider ad0.
>
> Every time it is rebuilding ad0.   Every single boot in the last two  
> weeks.
>
> Is this any way to get more logging from geom, to confirm or deny this  
> theory?
>
> Is there anything else I should be looking at?
>
> FWIW, this never happened before the p11 patch to 6.2.   I don't know if  
> that is related or not.
>
> Obviously, I can't upgrade to 6.3 if heavy cpu/disk activity kills the  
> system.
>
> No, I don't have any other insights.  I'm not prone to posting "duh help  
> me please!" posts, so I'm quite a bit frustrated by this one.


You can try going into the kernel debugger to see where it is hanging.  
Debugging via a serial cable is also very easy.
I don't know the details, but there is a lot of info in the Freebsd  
handbook. Put this in google 'freebsd handbook kernel debug'.

Ronald.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.ud4lq7fo8527sy>